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Preface 


This edition of Fundamentals of Computer Graphics adds four new contributed 
chapters and contains substantial reorganizations and improvements to the core 
material. The new chapters add coverage of implicit modeling and of two impor¬ 
tant graphics applications: games and information visualization. The fourth new 
contributed chapter is a major upgrade to the material on color science. As with 
the chapters added in the second edition, we have chosen the contributors both for 
their expertise and for their clear way of expressing ideas. 

We have made a number of changes to the early chapters of the book, integrat¬ 
ing the second author’s experience teaching introductory graphics at Cornell using 
the first and second editions. Most of these have been revised and updated, partic¬ 
ularly the chapters on images, viewing, ray tracing, the graphics pipeline, and the 
material on triangle meshes. Some of the original material from these chapters 
has been reorganized, sometimes with topics appearing in different chapters than 
in the previous editions. 

Our aim in this reorganization has been to move the elementary material to¬ 
wards the beginning. In our thinking, Chapters 2 through 8 constitute the “core 
core,” taking the straight and narrow path through what is absolutely required 
for understanding how images get onto the screen using the complementary ap¬ 
proaches of ray tracing and rasterization. Ray tracing is covered first, since it is 
the simplest way to generate images of 3D scenes, followed by the mathemati¬ 
cal machinery required for the graphics pipeline, then the pipeline itself. After 
that, the “outer core” covers other topics that would commonly be included in an 
introductory class. For example, ray tracing is split into two chapters, with the 
more advanced material now in Chapter 13. The material on spatial data struc- 
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tures (some formerly under Ray Tracing and Hidden Surfaces) is consolidated in 
Chapter 12 together with an expanded section on triangle meshes. 

In all these revisions, we have endeavored to retain the informal, intuitive 
style of presentation that characterizes the earlier editions, while at the same time 
improving consistency, precision, and completeness. We hope the reader will find 
the result is a better platform for a variety of courses in computer graphics. 

About the Cover 

The cover image is from Tiger in the Water by J. W. Baker (brushed and air- 
brushed acrylic on canvas, 16” by 20”, www.jwbart.com). 

The subject of a tiger is a reference to a wonderful talk given by Alain Fournier 
(1943-2000) at the Cornell Workshop in 1998. His talk was an evocative verbal 
description of the movements of a tiger. He summarized his point: 

Even though modelling and rendering in computer graphics have 
been improved tremendously in the past 35 years, we are still not 
at the point where we can model automatically a tiger swimming in 
the river in all its glorious details. By automatically I mean in a way 
that does not need careful manual tweaking by an artist/expert. 

The bad news is that we have still a long way to go. 

The good news is that we have still a long way to go. 


Online Resources 

The web site for this book is http://www.cs.cornell.edu/~srm/fcg3/. We will con¬ 
tinue to maintain a list of errata and links to courses that use the book, as well as 
teaching materials that match the book’s style. Most of the figures in this book are 
in Abobe Illustrator format, and we would be happy to convert specific figures into 
portable formats on request. Please feel free to contact us at shirley@cs.utah.edu 
or srm@cs.cornell.edu. 
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Introduction 


The term computer graphics describes any use of computers to create and ma¬ 
nipulate images. This book introduces the algorithmic and mathematical tools 
that can be used to create all kinds of images—realistic visual effects, informative 
technical illustrations, or beautiful computer animations. Graphics can be two- or 
three-dimensional; images can be completely synthetic or can be produced by ma¬ 
nipulating photographs. This book is about the fundamental algorithms and math¬ 
ematics, especially those used to produce synthetic images of three-dimensional 
objects and scenes. 

Actually doing computer graphics inevitably requires knowing about spe¬ 
cific hardware, file formats, and usually a graphics API (see Section 1.3) or two. 
Computer graphics is a rapidly evolving field, so the specifics of that knowledge 
are a moving target. Therefore, in this book we do our best to avoid depending 
on any specific hardware or API. Readers are encouraged to supplement the text 
with relevant documentation for their software and hardware environment. For¬ 
tunately, the culture of computer graphics has enough standard terminology and 
concepts that the discussion in this book should map nicely to most environments. 

This chapter defines some basic terminology, and provides some historical 
background as well as information sources related to computer graphics. 

1.1 Graphics Areas 

Imposing categories on any field is dangerous, but most graphics practitioners 
would agree on the following major areas of computer graphics: 


API: application program in¬ 
terface. 
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• Modeling deals with the mathematical specification of shape and appear¬ 
ance properties in a way that can be stored on the computer. For example, 
a coffee mug might be described as a set of ordered 3D points along with 
some interpolation rule to connect the points and a reflection model that 
describes how light interacts with the mug. 

• Rendering is a term inherited from art and deals with the creation of 
shaded images from 3D computer models. 

• Animation is a technique to create an illusion of motion through sequences 
of images. Animation uses modeling and rendering but adds the key issue 
of movement over time, which is not usually dealt with in basic modeling 
and rendering. 

There are many other areas that involve computer graphics, and whether they are 
core graphics areas is a matter of opinion. These will all be at least touched on in 
the text. Such related areas include the following: 

• User interaction deals with the interface between input devices such as 
mice and tablets, the application, feedback to the user in imagery, and 
other sensory feedback. Historically, this area is associated with graph¬ 
ics largely because graphics researchers had some of the earliest access to 
the input/output devices that are now ubiquitous. 

• Virtual reality attempts to immerse the user into a 3D virtual world. This 
typically requires at least stereo graphics and response to head motion. 
For true virtual reality, sound and force feedback should be provided as 
well. Because this area requires advanced 3D graphics and advanced dis¬ 
play technology, it is often closely associated with graphics. 

• Visualization attempts to give users insight into complex information via 
visual display. Often there are graphic issues to be addressed in a visualiza¬ 
tion problem. 

• Image processing deals with the manipulation of 2D images and is used in 
both the fields of graphics and vision. 

• 3D scanning uses range-finding technology to create measured 3D models. 
Such models are useful for creating rich visual imagery, and the processing 
of such models often requires graphics algorithms. 

• Computational photography is the use of computer graphics, computer 
vision, and image processing methods to enable new ways of photographi¬ 
cally capturing objects, scenes, and environments. 
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1.2 Major Applications 

Almost any endeavor can make some use of computer graphics, but the major 
consumers of computer graphics technology include the following industries: 

• Video games increasingly use sophisticated 3D models and rendering al¬ 
gorithms . 

• Cartoons are often rendered directly from 3D models. Many traditional 
2D cartoons use backgrounds rendered from 3D models, which allows a 
continuously moving viewpoint without huge amounts of artist time. 

• Visual effects use almost all types of computer graphics technology. Al¬ 
most every modern film uses digital compositing to superimpose back¬ 
grounds with separately filmed foregrounds. Many films also use 3D mod¬ 
eling and animation to create synthetic environments, objects, and even 
characters that most viewers will never suspect are not real. 

• Animated films use many of the same techniques that are used for visual 
effects, but without necessarily aiming for images that look real. 

• CAD/CAM stands for computer-aided design and computer-aided manu¬ 
facturing. These fields use computer technology to design parts and prod¬ 
ucts on the computer and then, using these virtual designs, to guide the 
manufacturing process. For example, many mechanical parts are designed 
in a 3D computer modeling package and then automatically produced on a 
computer-controlled milling device. 

• Simulation can be thought of as accurate video gaming. For example, a 
flight simulator uses sophisticated 3D graphics to simulate the experience 
of flying an airplane. Such simulations can be extremely useful for initial 
training in safety-critical domains such as driving, and for scenario training 
for experienced users such as specific fire-fighting situations that are too 
costly or dangerous to create physically. 

• Medical imaging creates meaningful images of scanned patient data. For 
example, a computed tomography (CT) dataset is composed of a large 3D 
rectangular array of density values. Computer graphics is used to create 
shaded images that help doctors extract the most salient information from 
such data. 

• Information visualization creates images of data that do not necessarily 
have a “natural” visual depiction. For example, the temporal trend of the 
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price of ten different stocks does not have an obvious visual depiction, but 
clever graphing techniques can help humans see the patterns in such data. 


1.3 Graphics APIs 

A key part of using graphics libraries is dealing with a graphics API. An applica¬ 
tion program interface (API) is a standard collection of functions to perform a set 
of related operations, and a graphics API is a set of functions that perform basic 
operations such as drawing images and 3D surfaces into windows on the screen. 

Every graphics program needs to be able to use two related APIs: a graphics 
API for visual output and a user-interface API to get input from the user. There 
are currently two dominant paradigms for graphics and user-interface APIs. The 
first is the integrated approach, exemplified by Java, where the graphics and user- 
interface toolkits are integrated and portable packages that are fully standardized 
and supported as part of the language. The second is represented by Direct3D 
and OpenGL, where the drawing commands are part of a software library tied to 
a language such as C++, and the user-interface software is an independent entity 
that might vary from system to system. In this latter approach, it is problematic 
to write portable code, although for simple programs it may be possible to use a 
portable library layer to encapsulate the system specific user-interface code. 

Whatever your choice of API, the basic graphics calls will be largely the same, 
and the concepts of this book will apply. 


1.4 Graphics Pipeline 

Every desktop computer today has a powerful 3D graphics pipeline. This is a 
special software/hardware subsystem that efficiently draws 3D primitives in per¬ 
spective. Usually these systems are optimized for processing 3D triangles with 
shared vertices. The basic operations in the pipeline map the 3D vertex locations 
to 2D screen positions and shade the triangles so that they both look realistic and 
appear in proper back-to-front order. 

Although drawing the triangles in valid back-to-front order was once the most 
important research issue in computer graphics, it is now almost always solved 
using the z -buffer, which uses a special memory buffer to solve the problem in a 
brute-force manner. 

It turns out that the geometric manipulation used in the graphics pipeline can 
be accomplished almost entirely in a 4D coordinate space composed of three tra- 
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ditional geometric coordinates and a fourth homogeneous coordinate that helps 
with perspective viewing. These 4D coordinates are manipulated using 4x4 
matrices and 4-vectors. The graphics pipeline, therefore, contains much machin¬ 
ery for efficiently processing and composing such matrices and vectors. This 
4D coordinate system is one of the most subtle and beautiful constructs used in 
computer science, and it is certainly the biggest intellectual hurdle to jump when 
learning computer graphics. A big chunk of the first part of every graphics book 
deals with these coordinates. 

The speed at which images can be generated depends strongly on the number 
of triangles being drawn. Because interactivity is more important in many appli¬ 
cations than visual quality, it is worthwhile to minimize the number of triangles 
used to represent a model. In addition, if the model is viewed in the distance, 
fewer triangles are needed than when the model is viewed from a closer distance. 
This suggests that it is useful to represent a model with a varying level of detail 
(LOD). 

1.5 Numerical Issues 

Many graphics programs are really just 3D numerical codes. Numerical issues 
are often crucial in such programs. In the “old days,” it was very difficult to han¬ 
dle such issues in a robust and portable manner because machines had different 
internal representations for numbers, and even worse, handled exceptions in dif¬ 
ferent and incompatible ways. Fortunately, almost all modern computers conform 
to the IEEE floating-point standard (IEEE Standards Association, 1985). This al¬ 
lows the programmer to make many convenient assumptions about how certain 
numeric conditions will be handled. 

Although IEEE floating-point has many features that are valuable when cod¬ 
ing numeric algorithms, there are only a few that are crucial to know for most 
situations encountered in graphics. First, and most important, is to understand 
that there are three “special” values for real numbers in IEEE floating-point: 

1 . infinity (oo). This is a valid number that is larger than all other valid num¬ 
bers. 

2. minus infinity (—oo). This is a valid number that is smaller than all other 
valid numbers. 

3. not a number (NaN). This is an invalid number that arises from an opera¬ 
tion with undefined consequences, such as zero divided by zero. 

The designers of IEEE floating-point made some decisions that are extremely 
convenient for programmers. Many of these relate to the three special values 
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IEEE floating-point has two 
representations for zero, 
one that is treated as pos¬ 
itive and one that is treated 
as negative. The distinction 
between - 0 and + 0 only 
occasionally matters, but it 
is worth keeping in mind 
for those occasions when it 
does. 

Other operations involving infinite values behave the way one would expect. 

Again for positive a, the behavior is: 

00 + 00 = +oo 
oo — oo = NaN 

00 X 00 = 00 

oo/oo = NaN 
oo /a = oo 
oo/O = oo 

0/0 = NaN 

The rules in a Boolean expression involving infinite values are as expected: 

1. All finite valid numbers are less than +oo. 

2. All finite valid numbers are greater than — oo. 

3. —oo is less than+oo. 


above in handling exceptions such as division by zero. In these cases an exception 
is logged, but in many cases the programmer can ignore that. Specifically, for any 
positive real number a, the following rules involving division by infinite values 
hold: 

+a/(+oo) = +0 
—a/(+oo) = —0 
+a/(—oo) = —0 
—a/(—oo) = +0 


The rules involving expressions that have NaN values are simple: 

1. Any arithmetic expression that includes NaN results in NaN. 

2. Any Boolean expression involving NaN is false. 

Perhaps the most useful aspect of IEEE floating-point is how divide-by-zero is 
handled; for any positive real number a, the following rules involving division by 

Some care must be taken zero values hold: 
if negative zero (-0) might 

arise. +a/ +0 = +oo 

—a/ +0 = —oo 
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There are many numeric computations that become much simpler if the pro¬ 
grammer takes advantage of the IEEE rules. For example, consider the expres¬ 
sion: 



Such expressions arise with resistors and lenses. If divide-by-zero resulted in a 
program crash (as was true in many systems before IEEE floating-point), then 
two if statements would be required to check for small or zero values of b or c. 
Instead, with IEEE floating-point, if b or c is zero, we will get a zero value for a as 
desired. Another common technique to avoid special checks is to take advantage 
of the Boolean properties of NaN. Consider the following code segment: 

a = f(x) 

if (a > 0) then 

do something 

Here, the function / may return “ugly” values such as oo or NaN, but the if con¬ 
dition is still well-defined: it is false for a = NaN or a = —oo and true for 
a = +oo. With care in deciding which values are returned, often the if can make 
the right choice, with no special checks needed. This makes programs smaller, 
more robust, and more efficient. 


1.6 Efficiency 

There are no magic rules for making code more efficient. Efficiency is achieved 
through careful tradeoffs, and these tradeoffs are different for different architec¬ 
tures. However, for the foreseeable future, a good heuristic is that programmers 
should pay more attention to memory access patterns than to operation counts. 
This is the opposite of the best heuristic of two decades ago. This switch has oc¬ 
curred because the speed of memory has not kept pace with the speed of proces¬ 
sors. Since that trend continues, the importance of limited and coherent memory 
access for optimization should only increase. 

A reasonable approach to making code fast is to proceed in the following 
order, taking only those steps which are needed: 

1. Write the code in the most straightforward way possible. Compute inter¬ 
mediate results as needed on the fly rather than storing them. 

2. Compile in optimized mode. 

3. Use whatever profiling tools exist to find critical bottlenecks. 
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4. Examine data structures to look for ways to improve locality. If possible, 
make data unit sizes match the cache/page size on the target architecture. 

5. If profiling reveals bottlenecks in numeric computations, examine the as¬ 
sembly code generated by the compiler for missed efficiencies. Rewrite 
source code to solve any problems you find. 

The most important of these steps is the first one. Most “optimizations” make the 
code harder to read without speeding things up. In addition, time spent upfront 
optimizing code is usually better spent correcting bugs or adding features. Also, 
beware of suggestions from old texts; some classic tricks such as using integers 
instead of reals may no longer yield speed because modern CPUs can usually 
perform floating-point operations just as fast as they perform integer operations. 
In all situations, profiling is needed to be sure of the merit of any optimization for 
a specific machine and compiler. 


1.7 Designing and Coding Graphics Programs 

Certain common strategies are often useful in graphics programming. In this 
section we provide some advice that you may find helpful as you implement the 
methods you learn about in this book. 


1.7.1 Class Design 


I believe strongly in the 
KISS (“keep it simple, 
stupid”) principle, and in 
that light the argument for 
two classes is not com¬ 
pelling enough to justify the 
added complexity. —P.S. 


I like keeping points and 
vectors separate because 
it makes code more read¬ 
able and can let the com¬ 
piler catch some bugs. 

—S.M. 


A key part of any graphics program is to have good classes or routines for geomet¬ 
ric entities such as vectors and matrices, as well as graphics entities such as RGB 
colors and images. These routines should be made as clean and efficient as pos¬ 
sible. A universal design question is whether locations and displacements should 
be separate classes because they have different operations, e.g., a location mul¬ 
tiplied by one-half makes no geometric sense while one-half of a displacement 
does (Goldman, 1985; DeRose, 1989). There is little agreement on this question, 
which can spur hours of heated debate among graphics practitioners, but for the 
sake of example let’s assume we will not make the distinction. 

This implies that some basic classes to be written include: 

• vector2. A 2D vector class that stores an x- and //-component. It should 
store these components in a length-2 array so that an indexing operator can 
be well supported. You should also include operations for vector addition, 
vector subtraction, dot product, cross product, scalar multiplication, and 
scalar division. 
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• vector3. A 3D vector class analogous to vector2. 

• hvector. A homogeneous vector with four components (see Chapter 7). 

• rgb. An RGB color that stores three components. You should also include 
operations for RGB addition, RGB subtraction, RGB multiplication, scalar 
multiplication, and scalar division. 

• transform. A 4 x 4 matrix for transformations. You should include a 
matrix multiply and member functions to apply to locations, directions, and 
surface normal vectors. As shown in Chapter 6, these are all different. 

• image. A 2D array of RGB pixels with an output operation. 

In addition, you might or might not want to add classes for intervals, orthonormal 
bases, and coordinate frames. 


1.7.2 Float vs. Double 

Modern architecture suggests that keeping memory use down and maintaining 
coherent memory access are the keys to efficiency. This suggests using single¬ 
precision data. However, avoiding numerical problems suggests using double¬ 
precision arithmetic. The tradeoffs depend on the program, but it is nice to have a 
default in your class definitions. 


1.7.3 Debugging Graphics Programs 

If you ask around, you may find that as programmers become more experienced, 
they use traditional debuggers less and less. One reason for this is that using such 
debuggers is more awkward for complex programs than for simple programs. 
Another reason is that the most difficult errors are conceptual ones where the 
wrong thing is being implemented, and it is easy to waste large amounts of time 
stepping through variable values without detecting such cases. We have found 
several debugging strategies to be particularly useful in graphics. 

The Scientific Method 

In graphics programs there is an alternative to traditional debugging that is often 
very useful. The downside to it is that it is very similar to what computer pro¬ 
grammers are taught not to do early in their careers, so you may feel “naughty” 
if you do it: we create an image and observe what is wrong with it. Then, we 


You might also consider a 
special class for unit-length 
vectors, although I have 
found them more pain than 
they are worth. —PS. 


I suggest using doubles for 
geometric computation and 
floats for color computation. 
For data that occupies a lot 
of memory, such as trian¬ 
gle meshes, I suggest stor¬ 
ing float data, but convert¬ 
ing to double when data is 
accessed through member 
functions. —PS. 


I advocate doing all com¬ 
putations with floats until 
you find evidence that dou¬ 
ble precision is needed in a 
particular part of the code. 
—S.M. 
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develop a hypothesis about what is causing the problem and test it. For example, 
in a ray-tracing program we might have many somewhat random looking dark 
pixels. This is the classic “shadow acne” problem that most people run into when 
they write a ray tracer. Traditional debugging is not helpful here; instead, we must 
realize that the shadow rays are hitting the surface being shaded. We might notice 
that the color of the dark spots is the ambient color, so the direct lighting is what 
is missing. Direct lighting can be turned off in shadow, so you might hypothesize 
that these points are incorrectly being tagged as in shadow when they are not. To 
test this hypothesis, we could turn off the shadowing check and recompile. This 
would indicate that these are false shadow tests, and we could continue our de¬ 
tective work. The key reason that this method can sometimes be good practice is 
that we never had to spot a false value or really determine our conceptual error. 
Instead, we just narrowed in on our conceptual error experimentally. Typically 
only a few trials are needed to track things down, and this type of debugging is 
enjoyable. 

Images as Coded Debugging Output 

In many cases, the easiest channel by which to get debugging information out of a 
graphics program is the output image itself. If you want to know the value of some 
variable for part of a computation that runs for every pixel, you can just modify 
your program temporarily to copy that value directly to the output image and skip 
the rest of the calculations that would normally be done. For instance, if you 
suspect a problem with surface normals is causing a problem with shading, you 
can copy the normal vectors directly to the image (x goes to red, y goes to green, 
z goes to blue), resulting in a color-coded illustration of the vectors actually being 
used in your computation. Or, if you suspect a particular value is sometimes out 
of its valid range, make your program write bright red pixels where that happens. 
Other common tricks include drawing the back sides of surfaces with an obvious 
color (when they are not supposed to be visible), coloring the image by the ID 
numbers of the objects, or coloring pixels by the amount of work they took to 
compute. 

Using a Debugger 

There are still cases, particularly when the scientific method seems to have led 
to a contradiction, when there’s no substitute for observing exactly what is going 
on. The trouble is that graphics programs often involve many, many executions 
of the same code (once per pixel, for instance, or once per triangle), making it 
completely impractical to step through in the debugger from the start. And the 
most difficult bugs usually only occur for complicated inputs. 
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A useful approach is to “set a trap” for the bug. First, make sure your program 
is deterministic—run it in a single thread and make sure that all random numbers 
are computed from fixed seeds. Then, find out which pixel or triangle is exhibiting 
the bug and add a statement before the code you suspect is incorrect that will be 
executed only for the suspect case. For instance, if you find that pixel (126,247) 
exhibits the bug, then add: 

if x = 126 and y = 247 then 
print “blarg!” 

If you set a breakpoint on the print statement, you can drop into the debugger just 
before the pixel you’re interested in is computed. Some debuggers have a “condi¬ 
tional breakpoint” feature that can achieve the same thing without modifying the 
code. 

In the cases where the program crashes, a traditional debugger is useful for 
pinpointing the site of the crash. You should then start backtracking in the pro¬ 
gram, using asserts and recompiles, to find where the program went wrong. These 
asserts should be left in the program for potential future bugs you will add. This 
again means the traditional step-though process is avoided, because that would 
not be adding the valuable asserts to your program. 


A special debugging mode 
that uses fixed random- 
number seeds is useful. 


Data Visualization for Debugging 

Often it is hard to understand what your program is doing, because it computes a 
lot of intermediate results before it finally goes wrong. The situation is similar to 
a scientific experiment that measures a lot of data, and one solution is the same: 
make good plots and illustrations for yourself to understand what the data means. 
For instance, in a ray tracer you might write code to visualize ray trees so you 
can see what paths contributed to a pixel, or in an image resampling routine you 
might make plots that show all the points where samples are being taken from the 
input. Time spent writing code to visualize your program’s internal state is also 
repaid in a better understanding of its behavior when it comes time to optimize it. 


I like to format debugging 
print statements so that the 
output happens to be a 
Matlab or Gnuplot script 
that makes a helpful plot. 
—S.M. 


Notes 

The discussion of software engineering is influenced by the Effective C++ se¬ 
ries (Meyers, 1995,1997), the Extreme Programming movement (Beck & Andres, 
2004), and (Kernighan & Pike, 1999). The discussion of experimental debugging 
is based on discussions with Steve Parker. 

There are a number of annual conferences related to computer graphics, in¬ 
cluding ACM SIGGRAPH and SIGGRAPH Asia, Grpahics Interface, the Game 
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Developers Conference (GDC), Eurographics, Pacific Graphics, High Perfor¬ 
mance Graphics, the Eurographics Symposium on Rendering, and IEEE VisWeek. 
These can be readily found by web searches on their names. 






CM 






Miscellaneous Math 


Much of graphics is just translating math directly into code. The cleaner the math, 
the cleaner the resulting code; so much of this book concentrates on using just the 
right math for the job. This chapter reviews various tools from high school and 
college mathematics and is designed to be used more as a reference than as a tu¬ 
torial. It may appear to be a hodge-podge of topics and indeed it is; each topic 
is chosen because it is a bit unusual in “standard” math curricula, because it is 
of central importance in graphics, or because it is not typically treated from a ge¬ 
ometric standpoint. In addition to establishing a review of the notation used in 
the book, the chapter also emphasizes a few points that are sometimes skipped 
in the standard undergraduate curricula, such as barycentric coordinates on tri¬ 
angles. This chapter is not intended to be a rigorous treatment of the material; 
instead intuition and geometric interpretation are emphasized. A discussion of 
linear algebra is deferred until Chapter 5 just before transformation matrices are 
discussed. Readers are encouraged to skim this chapter to familiarize themselves 
with the topics covered and to refer back to it as needed. The exercises at the end 
of the chapter may be useful in determining which topics need a refresher. 


2.1 Sets and Mappings 

Mappings, also called functions, are basic to mathematics and programming. Like 
a function in a program, a mapping in math takes an argument of one type and 
maps it to (returns) an object of a particular type. In a program we say “type;” in 
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f and the inverse function 
f -1 . Note that f _1 is also 
a bijection. 


math we would identify the set. When we have an object that is a member of a 
set, we use the £ symbol. For example, 

a £ S, 

can be read “a is a member of set S.” Given any two sets A and B, we can create 
a third set by taking the Cartesian product of the two sets, denoted A x B. This 
set A x B is composed of all possible ordered pairs (a, b) where a £ A and 
b £ B. As a shorthand, we use the notation A 2 to denote A x A. We can extend 
the Cartesian product to create a set of all possible ordered triples from three sets 
and so on for arbitrarily long ordered tuples from arbitrarily many sets. 

Common sets of interest include: 

• R—the real numbers; 

• R + —the non-negative real numbers (includes zero); 

• R 2 —the ordered pairs in the real 2D plane; 

• R"—the points in n-dimensional Cartesian space; 

• Z—the integers; 

• S 2 — the set of 3D points (points in R 3 ) on the unit sphere. 

Note that although S 2 is composed of points embedded in three-dimensional 
space, they are on a surface that can be parameterized with two variables, so it 
can be thought of as a 2D set. Notation for mappings uses the arrow and a colon, 
for example: 

/:1h Z, 

which you can read as ‘‘There is a function called / that takes a real number as 
input and maps it to an integer.” Here, the set that comes before the arrow is called 
the domain of the function, and the set on the right-hand side is called the target. 
Computer programmers might be more comfortable with the following equivalent 
language: “There is a function called / which has one real argument and returns 
an integer.” In other words, the set notation above is equivalent to the common 
programming notation: 

integer /(real) <— equivalent —> / : R i—> Z. 

So the colon-arrow notation can be thought of as a programming syntax. It’s that 
simple. 

The point /(a) is called the image of a, and the image of a set A (a subset of 
the domain) is the subset of the target that contains the images of all points in A. 
The image of the whole domain is called the range of the function. 
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2.1.1 Inverse Mappings 

If we have a function / : Ah B, there may exist an inverse function f~ x : B i—> 
A, which is defined by the rule / -1 (b) = a where b = /(a). This definition only 
works if every b £ B is an image of some point under / (that is, the range equals 
the target) and if there is only one such point (that is, there is only one a for which 
/(a) = b). Such mappings or functions are called bijections. A bijection maps 
every a £ A to a unique b £ B, and for every b £ B, there is exactly one a £ A 
such that f(a) = b (Figure 2.1). A bijection between a group of riders and horses 
indicates that everybody rides a single horse, and every horse is ridden. The two 
functions would be rider(horse) and horse(rider). These are inverse functions of 
each other. Functions that are not bijections have no inverse (Figure 2.2). 

An example of a bijection is / : R. i—> M,with f(x) = x 3 . The inverse 
function is f~ 1 (x) = f/x. This example shows that the standard notation can be 
somewhat awkward because x is used as a dummy variable in both / and / _1 . It 
is sometimes more intuitive to use different dummy variables, with y = f(x) and 
x = f~ 1 {y). This yields the more intuitive y = x 3 and x = ^fy . An example of a 
function that does not have an inverse is sqr :IhI, where sqr(x) = x 2 . This 
is true for two reasons: first x 2 = (—x) 2 , and second no members of the domain 
map to the negative portions of the target. Note that we can define an inverse if 
we restrict the domain and range to M + . Then ffx is a valid inverse. 

2.1.2 Intervals 

Often we would like to specify that a function deals with real numbers that are 
restricted in value. One such constraint is to specify an interval. An example of 
an interval is the real numbers between zero and one, not including zero or one. 
We denote this (0,1). Because it does not include its endpoints, this is referred 
to as an open inten’al. The corresponding closed interval, which does contain its 
endpoints, is denoted with square brackets: [0,1]. This notation can be mixed, i.e., 
[0,1) includes zero but not one. When writing an interval [a, b], we assume that 
a < b. The three common ways to represent an interval are shown in Figure 2.3. 
The Cartesian products of intervals are often used. For example, to indicate that 
a point x is in the unit cube in 3D, we say x £ [0, l] 3 . 

Intervals are particularly useful in conjunction with set operations: intersec¬ 
tion, union, and difference. For example, the intersection of two intervals is the 
set of points they have in common. The symbol (T is used for intersection. For ex¬ 
ample, [3,5) fl [4, 6] = [4, 5). For unions, the symbol U is used to denote points in 
either interval. For example, [3, 5) U [4, 6] = [3,6]. Unlike the first two operators, 
the difference operator produces different results depending on argument order. 



Figure 2.2. The function 
g does not have an inverse 
because two elements of d 
map to the same element 
of E. The function h has no 
inverse because element T 
of F has no element of d 
mapped to it. 



Figure 2.3. Three equiv¬ 
alent ways to denote the 
interval from a to b that 
includes b but not a. 
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Figure 2.4. Interval opera¬ 
tions on [3,5) and [4,6]. 


The minus sign is used for the difference operator, which returns the points in the 
left interval that are not also in the right. For example, [3, 5) — [4, 6] = [3,4) and 
[4, 6] — [3, 5) = [5, 6]. These operations are particularly easy to visualize using 
interval diagrams (Figure 2.4). 


2.1.3 Logarithms 


Although not as prevalent today as they were before calculators, logarithms are 
often useful in problems where equations with exponential terms arise. By defi¬ 
nition, every logarithm has a base a. The “log base a” of x is written log a x and 
is defined as “the exponent to which a must be raised to get x” i.e., 

y = log a x 4=> a v = x. 

Note that the logarithm base a and the function that raises a to a power are inverses 
of each other. This basic definition has several consequences: 

a log a (z) = x . 

log Q (a x ) = z; 

log a (xy) = log a x + log Q y, 
log a(x/y) = log a x - log Q y\ 
log a x = log a b log h x. 

When we apply calculus to logarithms, the special number e = 2.718 ... often 
turns up. The logarithm with base e is called the natural logarithm. We adopt the 
common shorthand In to denote it: 


In a: = log e x. 


Note that the “=” symbol can be read “is equivalent by definition.” Like i r, the 
special number e arises in a remarkable number of contexts. Many fields use a par¬ 
ticular base in addition to e for manipulations and omit the base in their notation, 
i.e., log x. For example, astronomers often use base 10 and theoretical computer 
scientists often use base 2. Because computer graphics borrows technology from 
many fields we will avoid this shorthand. 

The derivatives of logarithms and exponents illuminate why the natural loga¬ 
rithm is “natural”: 


dx l ° gaX a: In a’ 


dx 


= a x In < 


The constant multipliers above are unity only for a = e. 
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2.2 Solving Quadratic Equations 


A quadratic equation has the form 

Ax 2 + Bx + C = 0, 


where a; is a real unknown, and A, B, and C are known constants. If you think 
of a 2D xy plot with y = Ax 2 + Bx + C, the solution is just whatever x values 
are “zero crossings” in y. Because y = Ax 2 + Bx + C is a parabola, there will 
be zero, one, or two real solutions depending on whether the the parabola misses, 
grazes, or hits the cc-axis (Figure 2.5). 

To solve the quadratic equation analytically, we first divide by A: 

2 B C n 
x 2 + —x + — = 0. 

A A 


1 hen we ‘ complete i 


( B\ 2 B 2 C 

\ X+ 2Aj ~4A^ + A~°- 


Moving the constant portion to the right-hand side and taking the square root gives 


B , [B 2 C 
X+ 2A ~ ± \ 4A 2 ~ A' 

Subtracting B/(2A) from both sides and grouping terms with the denominator 
2 A gives the familiar form: 1 


x = 


-B± sjB 2 - 4 AC 
2 A 


( 2 . 1 ) 


Here the “db” symbol means there are two solutions, one with a plus sign and 
one with a minus sign. Thus 3 ± 1 equals “two or four.” Note that the term that 
determines the number of real solutions is 


D = B 2 - 4 AC, 


which is called the discriminant of the quadratic equation. If D > 0, there are two 
real solutions (also called roots). If D = 0, there is one real solution (a “double” 
root). If D < 0, there are no real solutions. 

For example, the roots of 2x 2 + 6x + 4 = 0 are x = — 1 and x = —2, and the 
equation x 2 +x + l has no real solutions. The discriminants of these equations are 
D = 4 and D = — 3, respectively, so we expect the number of solutions given. 
In programs, it is usually a good idea to evaluate D first and return “no roots” 
without taking the square root if D is negative. 

1 A robust implementation will use the equivalent expression 2 C/(—B v ft 2 A AC) to com¬ 

pute one of the roots, depending on the sign of B (Exercise 7). 



Figure 2.5. The geometric 
interpretation of the roots 
of a quadratic equation is 
the intersection points of a 
parabola with the x-axis. 
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2.3 Trigonometry 

In graphics we use basic trigonometry in many contexts. Usually, it is nothing too 
fancy, and it often helps to remember the basic definitions. 



Figure 2.6. Two half¬ 
lines cut the unit circle into 
two arcs. The length of 
either arc is a valid an¬ 
gle “between” the two half¬ 
lines. Either we can use the 
convention that the smaller 
length is the angle, or that 
the two halt-lines are spec¬ 
ified in a certain order and 
the arc that determines an¬ 
gle <j> is the one swept out 
counterclockwise from the 
first to the second half-line. 


a o 



Pythagorean theorem. 


2.3.1 Angles 

Although we take angles somewhat for granted, we should return to their defini¬ 
tion so we can extend the idea of the angle onto the sphere. An angle is formed 
between two half-lines (infinite rays stemming from an origin) or directions, and 
some convention must be used to decide between the two possibilities for the an¬ 
gle created between them as shown in Figure 2.6. An angle is defined by the 
length of the arc segment it cuts out on the unit circle. A common convention is 
that the smaller arc length is used, and the sign of the angle is determined by the 
order in which the two half-lines are specified. Using that convention, all angles 
are in the range [— it, 7t] . 

Each of these angles is the length of the arc of the unit circle that is “cut” by 
the two directions. Because the perimeter of the unit circle is 27r, the two possible 
angles sum to 27t. The unit of these arc lengths is radians. Another common unit 
is degrees, where the perimeter of the circle is 360 degrees. Thus, an angle that is 
7 r radians is 180 degrees, usually denoted 180°. The conversion between degrees 
and radians is 

degrees = 

radians = 

2.3.2 Trigonometric Functions 

Given a right triangle with sides of length a, o, and h, where h is the length of 
the longest side (which is always opposite the right angle), or hypotenuse , an 
important relation is described by the Pythagorean theorem: 

2,2 l2 

a +0 = h . 

You can see that this is true from Figure 2.7, where the big square has area (a+o) 2 , 
the four triangles have the combined area 2ao, and the center square has area h 2 . 

Because the triangles and inner square subdivide the larger square evenly, 
we have 2 ao + h 2 = (a + o) 2 , which is easily manipulated to the form above. 


180 

7T 

7T 

180 


radians; 


degrees. 
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We define sine and cosine of i 
expressions: 


, as well as the other ratio-based trigonometric 

sin (j)= o/h ; 
esc <f> = h/o; 
cos (f> = a/h ; 
sec (j) = h/a; 
tan <j>= o/a\ 
cot </>= a/o. 


These definitions allow us to set up polar coordinates, where a point is coded 
as a distance from the origin and a signed angle relative to the positive x-axis 
(Figure 2.8). Note the convention that angles are in the range </> £ (— tt, 7r], and 
that the positive angles are counterclockwise from the positive x-axis. This con¬ 
vention that counterclockwise maps to positive numbers is arbitrary, but it is used 
in many contexts in graphics so it is worth committing to memory. 

Trigonometric functions are periodic and can take any angle as an argument. 
For example sin(^4) = sin(A + 27 t). This means the functions are not invertible 
when considered with the domain R. This problem is avoided by restricting the 
range of standard inverse functions, and this is done in a standard way in almost 
all modern math libraries (e.g., (Plauger, 1991)). The domains and ranges are: 

asin : [—1,1] i—> [—tt/2, 7t/ 2] ; 
acos : [—1,11 i—> [0,7rl; 

( 2 . 2 ) 

atan : R i—> [— ir/2, 7r/2]; 
atan2 : R 2 i—> [— 7r, 7 t] . 


The last function, atan2(s, c) is often very useful. It takes an s value proportional 
to sin A and a c value that scales cos A by the same factor and returns A. The 
factor is assumed to be positive. One way to think of this is that it returns the 
angle of a 2D Cartesian point ( s, c ) in polar coordinates (Figure 2.9). 



Figure 2.8. Polar coordi¬ 
nates for the point ( x a , y a ) = 
(1,-s/3) is (r a ,<t> a ) = ( 2, rr/3). 



Figure 2.9. The function 
atan2(s,c) returns the angle 
A and is often very useful in 
graphics. 


2.3.3 Useful Identities 

This section lists without derivation a variety of useful trigonometric identities. 

Shifting identities: sin (_ A) = - sin A 

cos(—A) = cos A 
tan(— A) = — tan A 
sin(7r/2 — ^4) = cos A 
cos(7t/2 — A) = sin A 
tan(-7r/2 — A) = cot A 
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Pythagorean identities: gin 2 A + ^2 4 = j 

sec 2 A — tan 2 A = 1 
esc 2 A — cot 2 A = 1 


Addition and subtraction identities: 

sin(A + B) = sin A cos B + sin B cos A 
sin(A — B) = sin A cos B — sin B cos A 
sin(2A) = 2 sin A cos A 
cos(A + B) = cos A cos B — sin A sin B 


cos(A — B ) 
cos(2A) 

tan(A + B) 
tan(A — B) 
tan(2A) 


cos A cos B + sin A sin B 
cos 2 A — sin 2 A 
tan A + tan B 
1 — tan A tan B 
tan A — tan B 
1 + tan A tan B 
2 tan A 
1 — tan 2 A 


Half-angle identities: 

sin 2 (A/2) = (1 — cos A)/2 
cos 2 (A/2) = (l + cosA)/2 

Product identities: 

sin A sin B = — (cos(A + B) — cos(A — B))/ 2 
sin A cos B= (sin(A + B) + sin(A — B))/2 
cos A cos B = (cos(A + B) + cos(A — B))/2 


The following identities are for arbitrary triangles with side lengths a, b, and c, 
each with an angle opposite it given by A, B, C, respectively (Figure 2.10): 


sin A sin B sin C 

-= —-— = - (Law of sines) 

a b c 

c 2 = a 2 + b 2 — 2 ab cos C (Law of cosines) 


\ + b tan(^±^) 
tan (4=2) 


,-b 


(Law of tangents) 


The area of a triangle can also be computed in terms of these side lengths: 

= - \J {a + b + c) (—a + b + c) (a — b + c) (a + b — c). 


Figure 2.10. Geometry for 
triangle laws. 


triangle area 
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2.4 Vectors 

A vector describes a length and a direction. It can be usefully represented by an 
arrow. Two vectors are equal if they have the same length and direction even if we 
think of them as being located in different places (Figure 2.11). As much as pos¬ 
sible, you should think of a vector as an arrow and not as coordinates or numbers. 
At some point we will have to represent vectors as numbers in our programs, but 
even in code they should be manipulated as objects and only the low-level vector 
operations should know about their numeric representation (DeRose, 1989). Vec¬ 
tors will be represented as bold characters, e.g., a. A vector’s length is denoted 
||a||. A unit vector is any vector whose length is one. The zero vector is the vector 
of zero length. The direction of the zero vector is undefined. 

Vectors can be used to represent many different things. For example, they can 
be used to store an offset, also called a displacement. If we know “the treasure is 
buried two paces east and three paces north of the secret meeting place,” then we 
know the offset, but we don't know where to start. Vectors can also be used to 
store a location, another word for position or point. Locations can be represented 
as a displacement from another location. Usually there is some understood origin 
location from which all other locations are stored as offsets. Note that locations 
are not vectors. As we shall discuss, you can add two vectors. However, it usually 
does not make sense to add two locations unless it is an intermediate operation 
when computing weighted averages of a location (Goldman, 1985). Adding two 
offsets does make sense, so that is one reason why offsets are vectors. But this 
emphasizes that a location is not a offset; it is an offset from a specific origin 
location. The offset by itself is not the location. 


2.4.1 Vector Operations 

Vectors have most of the usual arithmetic operations that we associate with real 
numbers. Two vectors are equal if and only if they have the same length and direc¬ 
tion. Two vectors are added according to the parallelogram rule. This rule states 
that the sum of two vectors is found by placing the tail of either vector against the 
head of the other (Figure 2.12). The sum vector is the vector that “completes the 
triangle” started by the two vectors. The parallelogram is formed by taking the 
sum in either order. This emphasizes that vector addition is commutative: 

a + b = b + a. 

Note that the parallelogram rule just formalizes our intuition about displacements. 
Think of walking along one vector, tail to head, and then walking along the other. 



Figure 2.11. These two 
vectors are the same be¬ 
cause they have the same 
length and direction. 



Figure 2.12. Two vec¬ 
tors are added by arranging 
them head to tail. This can 
be done in either order. 
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The net displacement is just the parallelogram diagonal. You can also create a 
unary minus for a vector: —a (Figure 2.13) is a vector with the same length as a 
but opposite direction. This allows us to also define subtraction: 


Figure 2.13. The vector 
-a has the same length 
but opposite direction of the 
vector a. 



Figure 2.14. Vector sub¬ 
traction is just vector addi¬ 
tion with a reversal of the 
second argument. 



vector c is a weighted sum 
of any two non-parallel 2D 
vectors a and b. 



Figure 2.16. A 2D Carte¬ 
sian basis for vectors. 


b — a = —a + b. 

You can visualize vector subtraction with a parallelogram (Figure 2.14). We can 
write 

a + (b — a) = b. 

Vectors can also be multiplied. In fact, there are several kinds of products involv¬ 
ing vectors. First, we can scale the vector by multiplying it by a real number k. 
This just multiplies the vector’s length without changing its direction. For exam¬ 
ple, 3.5a is a vector in the same direction as a but it is 3.5 times as long as a. We 
discuss two products involving two vectors, the dot product and the cross prod¬ 
uct, later in this section, and a product involving three vectors, the determinant, in 
Chapter 5. 

2.4.2 Cartesian Coordinates of a Vector 

A 2D vector can be written as a combination of any two non-zero vectors which 
are not parallel. This property of the two vectors is called linear independence. 
Two linearly independent vectors form a 2D basis, and the vectors are thus re¬ 
ferred to as basis vectors. For example, a vector c may be expressed as a combi¬ 
nation of two basis vectors a and b (Figure 2.15): 

c = a c a+6 c b. (2.3) 

Note that the weights a c and b, are unique. Bases are especially useful if the 
two vectors are orthogonal, i.e., they are at right angles to each other. It is even 
more useful if they are also unit vectors in which case they are orthonormal. If we 
assume two such “special” vectors x and y are known to us, then we can use them 
to represent all other vectors in a Cartesian coordinate system, where each vector 
is represented as two real numbers. For example, a vector a might be represented 
as 

a = x a x+y a y, 

where x a and y a are the real Cartesian coordinates of the 2D vector a (Fig¬ 
ure 2.16). Note that this is not really any different conceptually from Equa¬ 
tion (2.3), where the basis vectors were not orthonormal. But there are several 
advantages to a Cartesian coordinate system. For instance, by the Pythagorean 
theorem, the length of a is 


INI = V X l + Va- 
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It is also simple to compute dot products, cross products, and coordinates of vec¬ 
tors in Cartesian systems, as we’ll see in the following sections. 

By convention we write the coordinates of a either as an ordered pair (x a . y „) 
or a column matrix: 

Xa 

a = 

Va_ 

The form we use will depend on typographic convenience. We will also occasion¬ 
ally write the vector as a row matrix, which we will indicate as a T : 

a T = [x a y a \ ■ 

We can also represent 3D, 4D, etc., vectors in Cartesian coordinates. For the 3D 
case, we use a basis vector z that is orthogonal to both x and y. 


2.4.3 Dot Product 


The simplest way to multiply two vectors is the dot product. The dot product of 
a and b is denoted a • b and is often called the scalar product because it returns 
a scalar. The dot product returns a value related to its arguments’ lengths and the 
angle <f> between them (Figure 2.17): 

a b = ||a|| |jb|| cos <j>, (2.4) 

The most common use of the dot product in graphics programs is to compute the 
cosine of the angle between two vectors. 

The dot product can also be used to find the projection of one vector onto 
another. This is the length a—b of a vector a that is projected at right angles onto 
a vector b (Figure 2.18): 

a->b= ||a|| cos<?i= (2.5) 

The dot product obeys the familiar associative and distributive properties we have 
in real arithmetic: 


a • b = b • a, 

a • (b + c) = a ■ b + a • c, (2.6) 

(fca) • b = a • (fcb) = fca ■ b. 



Figure 2.17. The dot 

product is related to length 
and angle and is one of the 
most important formulas in 
graphics. 



Figure 2.18. The projec¬ 
tion of a onto b is a length 
found by Equation (2.5). 


If 2D vectors a and b are expressed in Cartesian coordinates, we can take ad¬ 
vantage ofx-x = y- y = l and x • y = 0 to derive that their dot product 
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Figure 2.19. The cross 
product a x b is a 3D vector 
perpendicular to both 3D 
vectors a and b, and its 
length is equal to the area 
of the parallelogram shown. 


is 

a b = (x a x + y a y) ■ (x b x + y b y) 

= x a x b (x • x) + x a y b (x • y) + x b y a (y ■ x) + y a y b ( y • y) 
= x a x b + y a y b . 

Similarly in 3D we can find 

a • b = x a x b + y a y b + z a z b . 


2.4.4 Cross Product 

The cross product a x b is usually used only for three-dimensional vectors; gen¬ 
eralized cross products are discussed in references given in the chapter notes. The 
cross product returns a 3D vector that is perpendicular to the two arguments of 
the cross product. The length of the resulting vector is related to sin cf>: 

l|a x b|| = ||a|| ||b|| sin eft. 

The magnitude || a x b || is equal to the area of the parallelogram formed by vectors 
a and b. In addition, a x b is perpendicular to both a and b (Figure 2.19). Note 
that there are only two possible directions for such a vector. By definition, the 
vectors in the direction of the x-,y- and z-axes are given by 

x = (1,0,0), 

y = (0,1,0), 

z = (0,0,1), 

and we set as a convention that x x y must be in the plus or minus z direction. 
The choice is somewhat arbitrary, but it is standard to assume that 

z = x x y. 

All possible permutations of the three Cartesian unit vectors are 


X X y 

= +z, 

y x x 

= -z, 

y x z 

= +x, 

z x y 

= -X, 

z x x 

= +y, 

X X z 

= -y- 
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Because of the sin <j> property, we also know that a vector cross itself is the 
zero-vector, so x x x = 0 and so on. Note that the cross product is not commuta¬ 
tive, i.e., x/y/yxx. The careful observer will note that the above discussion 
does not allow us to draw an unambiguous picture of how the Cartesian axes re¬ 
late. More specifically, if we put x and y on a sidewalk, with x pointing East 
and y pointing North, then does z point up to the sky or into the ground? The 
usual convention is to have z point to the sky. This is known as a right-handed 
coordinate system. This name comes from the memory scheme of “grabbing” x 
with your right palm and fingers and rotating it toward y. The vector z should 
align with your thumb. This is illustrated in Figure 2.20. 

The cross product has the nice property that 

ax (b + c) = a x b + a x c, 


and 


a x (fcb) = k (a x b). 

However, a consequence of the right-hand rule is 


axb= (b x a). 


In Cartesian coordinates, we can use an explicit expansion to compute the cross 
product: 

a x b = (x a x + y a y + z a z) X (x b x + y b y + z b z) 

= X a X b X X X + X a y b x X y + X a Z b X X z 

+ y a x b y x X + y a y b y x y + y a z b y x z (2.7) 

+ z a x b z x x + z a y b z x y + z a z b z x z 
= {yaZb - z a y b )x + ( z a x b - x a z b )y + ( x a y b - y a x b ) z. 

So, in coordinate form, 

a x b = ( y a z b - z a y bl z a x b - x a z b , x a y b - y a x b ). (2.8) 


2.4.5 Orthonormal Bases and Coordinate Frames 

Managing coordinate systems is one of the core tasks of almost any graphics 
program; key to this is managing orthonormal bases. Any set of two 2D vectors 
u and v form an orthonormal basis provided that they are orthogonal (at right 
angles) and are each of unit length. Thus, 



Figure 2.20. The “right- 
hand rule” for cross prod¬ 
ucts. Imagine placing the 
base of your right palm 
where a and b join at their 
tails, and pushing the ar¬ 
row of a toward b. Your ex¬ 
tended right thumb should 
point toward a x b. 
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and 


u • v = 0. 


In 3D, three vectors u, v, and w form an orthonormal basis if 


HI = IMI = IMI =!, 


and 


uv = vw = wu = 0. 


This orthonormal basis is right-handed provided 


w = u x v, 


and otherwise it is left-handed. 

Note that the Cartesian canonical orthonormal basis is just one of infinitely 
many possible orthonormal bases. What makes it special is that it and its implicit 
origin location are used for low-level representation within a program. Thus, 
the vectors x, y, and z are never explicitly stored and neither is the canonical 



Figure 2.21. There is always a master or “canonical” coordinate system with origin o and 
orthonormal basis x, y, and z. This coordinate system is usually defined to be aligned to the 
global model and is thus often called the “global” or “world” coordinate system. This origin 
and basis vectors are never stored explicitly. All other vectors and locations are stored with 
coordinates that relate them to the global frame. The coordinate system associated with the 
plane are explicitly stored in terms of global coordinates. 
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origin location o. The global model is typically stored in this canonical coordinate 
system, and it is thus often called the global coordinate system. However, if we 
want to use another coordinate system with origin p and orthonormal basis vectors 
u, v, and w, then we do store those vectors explicitly. Such a system is called 
a frame of reference or coordinate frame . For example, in a flight simulator, we 
might want to maintain a coordinate system with the origin at the nose of the 
plane, and the orthonormal basis aligned with the airplane. Simultaneously, we 
would have the master canonical coordinate system (Figure 2.21). The coordinate 
system associated with a particular object, such as the plane, is usually called a 
local coordinate system . 

At a low level, the local frame is stored in canonical coordinates. For example, 
if u has coordinates (x u , y u , z u ), 

u = x u x + y u y + z u z. 

A location implicitly includes an offset from the canonical origin: 

P = O + x p x + y p y + ZpZ, 

where (x p , y p ,z p ) are the coordinates of p. 

Note that if we store a vector a with respect to the u-v-w frame, we store a 
triple (u a , v a , w a ) which we can interpret geometrically as 

a = 1t a U + 1> a V + Wa w. 

To get the canonical coordinates of a vector a stored in the u-v-w coordinate 
system, simply recall that u, v, and w are themselves stored in terms of Cartesian 
coordinates, so the expression w Q u + v a v + w a w is already in Cartesian coordi¬ 
nates if evaluated explicitly. To get the u-v-w coordinates of a vector b stored in 
the canonical coordinate system, we can use dot products: 

Ub = u ■ b; Vb = v ■ b: Wb = w ■ b 

This works because we know that for some Ub,Vb, and Wb, 

MfcU + VbV + WbW = b, 

and the dot product isolates the Ub coordinate: 

u b = Ub (u • u) + i>t,(u • v) + wt,(u ■ w) 

= Ub 

This works because u, v, and w are orthonormal. 

Using matrices to manage changes of coordinate systems is discussed in Sec¬ 
tions 6.2.1 and 6.5. 
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This same procedure can, 
of course, be used to con¬ 
struct the three vectors in 
any order; just pay atten¬ 
tion to the order of the cross 
products to ensure the ba¬ 
sis is right handed. 


u = a x b also produces an 
orthonormal basis, but it is 
left-handed. 


2.4.6 Constructing a Basis from a Single Vector 

Often we need an orthonormal basis that is aligned with a given vector. That is, 
given a vector a, we want an orthonormal u, v, and w such that w points in the 
same direction as a (Hughes & M511er, 1999), but we don’t particularly care what 
u and v are. One vector isn’t enough to uniquely determine the answer; we just 
need a robust procedure that will find any one of the possible bases. 

This can be done using cross products as follows. First make w a unit vector 
in the direction of a: 


a 



Then choose any vector t not collinear with w, and use the cross product to build 
a unit vector u perpendicular to w: 

t x w 

U = 7t-77 ■ 

||t X w|| 

If t is collinear with w the denominator will vanish, and if they are nearly collinear 
the results will have low precision. A simple procedure to find a vector suffi¬ 
ciently different from w is to start with t equal to w and change the smallest 
magnitude component of t to 1. For example, if w = (1/%/2, — 1/0) then 
t = (1/V2, -l/y/2,1). Once w and u are in hand, completing the basis is 
simple; 

v = w x u. 

An example of a situation where this construction is used is surface shading, 
where a basis aligned to the surface normal is needed but the rotation around 
the normal is often unimportant. 


2.4.7 Constructing a Basis from Two Vectors 

The procedure in the previous section can also be used in situations where the 
rotation of the basis around the given vector is important. A common example 
is building a basis for a camera: it’s important to have one vector aligned in the 
direction the camera is looking, but the orientation of the camera around that 
vector is not arbitrary, and it needs to be specified somehow. Once the orientation 
is pinned down, the basis is completely determined. 

A common way to fully specify a frame is by providing two vectors a (which 
specifies w) and b (which specifies v). If the two vectors are known to be per¬ 
pendicular it is a simple matter to construct the third vector by u = b x a. 



2.4. Vectors 


29 


To be sure that the resulting basis really is orthonormal, even if the input vec¬ 
tors weren’t quite, a procedure much like the single-vector procedure is advisable: 

W = M’ 

b x w 


u = 


Jb x w|| 
V = W X u. 


In fact, this procedure works just fine when a and b are not perpendicular. In this 
case, w will be constructed exactly in the direction of a, and v is chosen to be the 
closest vector to b among all vectors perpendicular to w. 

This procedure won’t work if a and b are collinear. In this case b is of no 
help in choosing which of the directions perpendicular to a we should use: it is 
perpendicular to all of them. 

In the example of specifying camera positions (Section 4.3), we want to con¬ 
struct a frame that has w parallel to the direction the camera is looking, and v 
should point out the top of the camera. To orient the camera upright, we build the 
basis around the view direction, using the straight-up direction as the reference 
vector to establish the camera’s orientation around the view direction. Setting v 
as close as possible to straight up exactly matches the intuitive notion of “holding 
the camera straight.” 


If you want me to set w and 
v to two non-perpendicular 
directions, something has 
to give—with this scheme 
I'll set everything the way 
you want, except I’ll make 
the smallest change to v so 
that it is in fact perpendicu¬ 
lar to w. 

What will go wrong with the 
computation if a and b are 
parallel? 


2.4.8 Squaring Up a Basis 

Occasionally you may find problems caused in your computations by a basis that 
is supposed to be orthonormal but where error has crept in—due to rounding error 
in computation, or to the basis having been stored in a file with low precision, for 
instance. 

The procedure of the previous section can be used; simply constructing the 
basis anew using the existing w and v vectors will produce a new basis that is 
orthonormal and is close to the old one. 

This approach is good for many applications, but it is not the best available. 
It does produce accurately orthogonal vectors, and for nearly orthogonal starting 
bases the result will not stray far from the starting point. However, it is asym¬ 
metric: it “favors” w over v and v over u (whose starting value is thrown away). 
It chooses a basis close to the starting basis but has no guarantee of choosing the 
closest orthonormal basis. When this is not good enough, the SVD (Section 5.4.1) 
can be used to compute an orthonormal basis that is guaranteed to be closest to 
the original basis. 
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A f 



Figure 2.22. An implicit 
function f{x,y) = 0 can be 
thought of as a height field 
where f is the height (top). 
A path where the height is 
zero is the implicit curve 
(bottom). 


f(x,y) = x 2 + y 2 - 7 



Figure 2.23. An implicit 
function f{x,y) = 0 can be 
thought of as a height field 
where 1 is the height (top). 
A path where the height is 
zero is the implicit curve 
(bottom). 


2.5 Curves and Surfaces 

The geometry of curves, and especially surfaces, plays a central role in graphics, 
and here we review the basics of curves and surfaces in 2D and 3D space. 


2.5.1 2D Implicit Curves 

Intuitively, a curve is a set of points that can be drawn on a piece of paper without 
lifting the pen. A common way to describe a curve is using an implicit equation. 
An implicit equation in two dimensions has the form 

f(x, y) = 0. 

The function f(x, y) returns a real value. Points (x, y) where this value is zero 
are on the curve, and points where the value is non-zero are not on the curve. For 
example, let’s say that f(x , y) is 

f{x,y) = {x - x c ) 2 + (y - y c ) 2 -r 2 , (2.9) 

where (x c , y c ) is a 2D point and r is a non-zero real number. If we take f(x, y) = 
0 , the points where this equality holds are on the circle with center (x c , y c ) and ra¬ 
dius r. The reason that this is called an “implicit” equation is that the points (x, y) 
on the curve cannot be immediately calculated from the equation and instead must 
be determined by solving the equation. Thus, the points on the curve are not gen¬ 
erated by the equation explicitly, but they are buried somewhere implicitly in the 
equation. 

It is interesting to note that / does have values for all (x,y). We can think of / 
as a terrain, with sea-level at / = 0 (Figure 2.22). The shore is the implicit curve. 
The value of / is the altitude. Another thing to note is that the curve partitions 
space into regions where / > 0, / < 0, and / = 0. So you evaluate / to decide 
whether a point is “inside” a curve. Note that f(x, y) = c is a curve for any 
constant c, and c = 0 is just used as a convention. For example if f{x, y) = x 2 + 
y 2 — 1, varying c just gives a variety of circles centered at the origin (Figure 2.23). 

We can compress our notation using vectors. If we have c = (x c ,y c ) and 
p = (x, y), then our circle with center c and radius r is defined by those position 
vectors that satisfy 

(p-c) • (p-c) -r 2 = 0. 

This equation, if expanded algebraically, will yield Equation (2.9), but it is easier 
to see that this is an equation for a circle by “reading” the equation geometrically. 
It reads, “points p on the circle have the following property: the vector from c to 
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p when dotted with itself has value r 2 .” Because a vector dotted with itself is just 
its own length squared, we could also read the equation as, “points p on the circle 
have the following property: the vector from c to p has squared length r 2 .” 

Even better, is to observe that the squared length is just the squared distance 
from c to p, which suggests the equivalent form 

|jp-c|| 2 -r 2 = 0 , 

and, of course, this suggests 

IIP — c|| — r = 0. 

The above could be read “the points p on the circle are those a distance r from 
the center point c,” which is as good a definition of circle as any. This illustrates 
that the vector form of an equation often suggests more geometry and intuition 
than the equivalent full-blown Cartesian form with x and y. For this reason, it 
is usually advisable to use vector forms when possible. In addition, you can 
support a vector class in your code; the code is cleaner when vector forms are 
used. The vector-oriented equations are also less error prone in implementation: 
once you implement and debug vector types in your code, the cut-and-paste errors 
involving x, y, and 2 will go away. It takes a little while to get used to vectors in 
these equations, but once you get the hang of it, the payoff is large. 


2.5.2 The 2D Gradient 


If we think of the function f(x,y ) as a height field with height = f{x,y ), the 
gradient vector points in the direction of maximum upslope, i.e., straight uphill. 
The gradient vector V/(x, y) is given by 


Vf(x,y) 


(d£ df\ 

\dx’ dy ) 


The gradient vector evaluated at a point on the implicit curve f(x,y) = 0 is 
perpendicular to the tangent vector of the curve at that point. This perpendicular 
vector is usually called the normal vector to the curve. In addition, since the 
gradient points uphill, it indicates the direction of the f(x, y) > 0 region. 

In the context of height fields, the geometric meaning of partial derivatives and 
gradients is more visible than usual. Suppose that near the point (a, b), f(x, y) is 
a plane (Figure 2.24). There is a specific uphill and downhill direction. At right 
angles to this direction is a direction that is level with respect to the plane. Any 
intersection between the plane and the f(x, y) = 0 plane will be in the direction 
that is level. Thus the uphill/downhill directions will be perpendicular to the line 
of intersection f(x, y) = 0. To see why the partial derivative has something to do 



Figure 2.24. A surface 
height = f(x,y) is locally pla¬ 
nar near (x,y) = (a,b). The 
gradient is a projection of 
the uphill direction onto the 
height = 0 plane. 
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y 



Figure 2.25. The deriva¬ 
tive of a 1D function mea¬ 
sures the slope of the line 
tangent to the curve. 


f A 



Figure 2.26. The par¬ 
tial derivative of a 2D func¬ 
tion with respect to 1 must 
hold y constant to have a 
unique value, as shown by 
the dark point. The hollow 
points show other values of 
f that do not hold y con¬ 
stant. 


with this, we need to visualize its geometric meaning. Recall that the conventional 
derivative of a ID function y = g(x ) is 


dy_ _ j. Ay _ g{x + Ax) - g(x) 
dx Ax —>0 Ax Ai-*o Ax 


( 2 . 10 ) 


This measures the slope of the tangent line to g (Figure 2.25). 

The partial derivative is a generalization of the ID derivative. For a 2D func¬ 
tion f(x,y), we can’t take the same limit for x as in Equation (2.10), because 
/ can change in many ways for a given change in x. However, if we hold y 
constant, we can define an analog of the derivative, called the partial derivative 
(Figure 2.26): 

df f(x + Ax,y)-f{x,y) 

— = lim ---. 

ox Ai^O Ax 

Why is it that the partial derivatives of x and y are the components of the gradient 
vector? Again, there is more obvious insight in the geometry than in the algebra. 
In Figure 2.27, we see the vector a travels along a path where f does not change. 
Note that this is again at a small enough scale that the surface height (a;, y) = 
/(x, y) can be considered locally planar. From the figure, we see that the vector 
a = (Ax, Ay). 

Because the uphill direction is perpendicular to a, we know the dot product is 
equal to zero: 


(V/) • a = (x v ,yv) • {x a ,y a ) = X V Ax + y v Ay = 0. (2.11) 


We also know that the change in / in the direction (x Q , y a ) equals zero: 

df 


a/ -§f a * 


df A _df 

vp Ay = — x a 
ay ox 


+ -rr-y a = 0 . 
dy 


( 2 . 12 ) 


Given any vectors (x,y) and ( x',y') that are perpendicular, we know that the 
angle between them is 90 degrees, and thus their dot product equals zero (recall 
that the dot product is proportional to the cosine of the angle between the two 
vectors). Thus, we have xx' + yy' = 0. Given (x,y), it is easy to construct 
valid vectors whose dot product with (x,y) equals zero, the two most obvious 
being (y, —x) and (—y,x); you can verify that these vectors give the desired 
zero dot product with (x, y). A generalization of this observation is that (x, y) is 
perpendicular to k(y, — x) where k is any non-zero constant. This implies that 


(x Q ,y a ) = k 


(d£ _d£\ 

\dy 7 dx) 


(2.13) 


Combining Equations (2.11) and (2.13) gives 
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where k! is any non-zero constant. By definition, “uphill” implies a positive 
change in /, so we would like k! > 0, and k' = 1 is a perfectly good convention. 

As an example of the gradient, consider the implicit circle x 2 + y 2 — 1 = 
0 with gradient vector (2x,2y), indicating that the outside of the circle is the 
positive region for the function f(x : y ) = x 2 + y 2 — 1. Note that the length 
of the gradient vector can be different depending on the multiplier in the implicit 
equation. For example, the unit circle can be described by Ax 2 + Ay 2 — A = 0 for 
any non-zero A. The gradient for this curve is (2 Ax, 2Ay). This will be normal 
(perpendicular) to the circle, but will have a length determined by A. For A > 0, 
the normal will point outward from the circle, and for A < 0, it will point inward. 
This switch from outward to inward is as it should be, since the positive region 
switches inside the circle. In terms of the height-field view, h = Ax 2 + Ay 2 — A, 
and the circle is at zero altitude. For A > 0, the circle encloses a depression, 
and for A < 0, the circle encloses a bump. As A becomes more negative, the 
bump increases in height, but the h = 0 circle doesn't change. The direction 
of maximum uphill doesn’t change, but the slope increases. The length of the 
gradient reflects this change in degree of the slope. So intuitively, you can think 
of the gradient’s direction as pointing uphill and its magnitude as measuring how 
uphill the slope is. 

Implicit 2D Lines 

The familiar “slope-intercept” form of the line is 

y = mx + b. (2.14) 

This can be converted easily to implicit form (Figure 2.28): 

y — mx — 6 = 0. (2.15) 

Here m is the “slope” (ratio of rise to run) and 6 is the y value where the line 
crosses the y-axis, usually called the y-intercept . The line also partitions the 2D 
plane, but here “inside” and “outside” might be more intuitively called “over” and 
“under.” 

Because we can multiply an implicit equation by any constant without chang¬ 
ing the points where it is zero, kf(x : y) = 0 is the same curve for any non-zero 
k. This allows several implicit forms for the same line, for example, 

2 y — 2 mx — 2b = 0. 

One reason the slope-intercept form is sometimes awkward is that it can’t rep¬ 
resent some lines such as x = 0 because m would have to be infinite. For this 



Figure 2.27. The vector a 
points in a direction where f 
has no change and is thus 
perpendicular to the gradi¬ 
ent vector V f. 



f(x,y) = y - mx - b 
m = -b/a 


Figure 2.28. A 2D line can 
be described by the equa¬ 
tion y — mx — b = 0. 
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\A,B) 


'^x 0 -yo) 
f(x.y) = 
V f(x,y 

X 

Ax + By + C 
= (A, B) 


Figure 2.29. The gradient 
vector (A, B ) is perpendi¬ 
cular to the implicit line Ax 
+ By+ C = 0. 


reason, a more general form is often useful: 

Ax + By + C = 0, (2.16) 

for real numbers A, B , C. 

Suppose we know two points on the line, (xq, yo) and (xi, yi). What A, B, 
and C describe the line through these two points? Because these points lie on the 
line, they must both satisfy Equation (2.16): 

Axq + Byo + C = 0, 

Axi + Byi + C = 0. 

Unfortunately we have two equations and three unknowns: A. B , and C. This 
problem arises because of the arbitrary multiplier we can have with an implicit 
equation. We could set C = 1 for convenience: 

Ax + By +1 = 0, 

but we have a similar problem to the infinite slope case in slope-intercept form: 
lines through the origin would need to have 7L(0) + B{ 0) + 1 = 0, which is a 
contradiction. For example, the equation for a 45-degree line through the origin 
can be written x — y = 0, or equally well y — x = 0, or even 17 y — 17x = 0, but 
it cannot be written in the form Ax + By + 1 = 0. 

Whenever we have such pesky algebraic problems, we try to solve the prob¬ 
lems using geometric intuition as a guide. One tool we have, as discussed in 
Section 2.5.2, is the gradient. For the line Ax + By + C = 0, the gradient vector 
is ( A, B). This vector is perpendicular to the line (Figure 2.29), and points to the 
side of the line where Ax + By + C is positive. Given two points on the line 
(xq, yo) and (xi, yi), we know that the vector between them points in the same 
direction as the line. This vector is just (xi — Xo, yi — yo), an d because it is paral¬ 
lel to the line, it must also be perpendicular to the gradient vector (A, B). Recall 
that there are an infinite number of (^4, B. C) that describe the line because of the 
arbitrary scaling property of implicits. We want any one of the valid (A, B, C). 

We can start with any ( A , B) perpendicular to (xi—xo, yi~yo)- Such a vector 
is just ( A, B ) = (yo — yi, x± — xo) by the same reasoning as in Section 2.5.2. 
This means that the equation of the line through (xq, yo) and (x±, yi) is 

{yo - yi)x + {xi - X 0 )y + c = 0. (2.17) 

Now we just need to find C. Because (xo, yo) and (xi, yi) are on the line, they 
must satisfy Equation (2.17). We can plug either value in and solve for C. Doing 
this for (xq, yo) yields C = xoyi — xiyo, and thus the full equation for the line is 


{yo - yi)x + (xi - x 0 )y + X 0 y\ - xi yo = 0. 


(2.18) 
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Again, this is one of infinitely many valid implicit equations for the line through 
two points, but this form has no division operation and thus no numerically de¬ 
generate cases for points with finite Cartesian coordinates. A nice thing about 
Equation (2.18) is that we can always convert to the slope-intercept form (when 
it exists) by moving the non -y terms to the right-hand side of the equation and 
dividing by the multiplier of the y term: 

2/i — 2/o . xiyo - Zo2/i 

y = - x H-. 

Xi — Xo X\ — Xo 

An interesting property of the implicit line equation is that it can be used to find 
the signed distance from a point to the line. The value of Ax + By + C is 
proportional to the distance from the line (Figure 2.30). As shown in Figure 2.31, 
the distance from a point to the line is the length of the vector k(A, B), which is 

distance = ky/ A 2 + B 2 . (2.19) 


For the point (x, y) + k(A, B), the value of f(x, y) = Ax + By + C is 
f(x + kA, y + kB ) = Ax + kA 2 + By + kB 2 + C 
= k(A 2 + B 2 ). 


( 2 . 20 ) 


The simplification in that equation is a result of the fact that we know (x, y) is on 
the line, so Ax + By + (7 = 0. From Equations (2.19) and (2.20), we can see that 
the signed distance from line Ax + By + C — 0 to a point (a, b) is 


distance = 


f{a,b) 
yJA 2 + B 2 ' 


Here “signed distance” means that its magnitude (absolute value) is the geometric 
distance, but on one side of the line, distances are positive and on the other they are 
negative. You can choose between the equally valid representations f[x, y) = 0 
and —f(x,y) = 0 if your problem has some reason to prefer a particular side 
being positive. Note that if (A,B) is a unit vector, then f(a,b ) is the signed 
distance. We can multiply Equation (2.18) by a constant that ensures that (A, B) 
is a unit vector: 


/0,y) = 


y o - 2/i 


Xl - x 0 


y/(x! - X 0 ) 2 + (2/0 - 2/i) 2 


\/(xi - x 0 ) 2 + ( 2/0 - 2/i) 2 
x 0 yi - xiy 0 


y 


V'Oi - xo) 2 + ( 2/0 - 2/i) 2 


= 0 . ( 2 . 21 ) 


Note that evaluating f(x , y) in Equation (2.21) directly gives the signed distance, 
but it does require a square root to set up the equation. Implicit lines will turn 
out to be very useful for triangle rasterization (Section 8.1.2). Other forms for 2D 
lines are discussed in Chapter 14. 



Figure 2.30. The value 
of the implicit function f(x,y) 
= Ax + By + C is a con¬ 
stant times the signed dis¬ 
tance from Ax+ By + C= 0. 


y(x,y) + k(A,B) 



f(x,y) = Ax + By + C 


Figure 2.31. The vec¬ 
tor k(A,B) connects a point 
(x,y) on the line closest to 
a point not on the line. 
The distance is proportional 
to k. 
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y, 



Figure 2.32. The ellipse 
with center (x c , y c ) and 
semi-axes of length a 
and b. 


Try setting a = b = r in the 
ellipse equation and com¬ 
pare to the circle equation. 


Implicit Quadric Curves 

In the previous section we saw that a linear function f(x,y) gives rise to an im¬ 
plicit line /(x, y) = 0. If / is instead a quadratic function of x and y, with the 
general form 

Ax 2 + Bxy + Cy 2 + Dx + Ey + F = 0, 

the resulting implicit curve is called a quadric. Two-dimensional quadric curves 
include ellipses and hyperbolas, as well as the special cases of parabolas, circles, 
and lines. 

Examples of quadric curves include the circle with center (x c ,y c ) and ra¬ 
dius r: 

(x - x c ) 2 + {y- y c ) 2 - r 2 = 0 

where (x c , y c ) is the center of the ellipse, and a and b are the minor and major 
semi-axes (Figure 2.32) .and axis-aligned ellipses of the form 

0 - Xc) 2 , (y - yc) 2 _, n 

a 2 ft 2 

2.5.3 3D Implicit Surfaces 

Just as implicit equations can be used to define curves in 2D, they can be used to 
define surfaces in 3D. As in 2D, implicit equations implicitly define a set of points 
that are on the surface 

f(x,y,z ) = 0 . 

Any point (x, y, z) that is on the surface results in zero when given as an argument 
to /. Any point not on the surface results in some number other than zero. You 
can check whether a point is on the surface by evaluating /, or you can check 
which side of the surface the point lies on by looking at the sign of /, but you 
cannot always explicitly construct points on the surface. Using vector notation, 
we will write such functions of p = (x, y, z) as 

/(P)=0. 


2.5.4 Surface Normal to an Implicit Surface 

A surface normal (which is needed for lighting computations, among other things) 
is a vector perpendicular to the surface. Each point on the surface may have a 
different normal vector. In the same way that the gradient provides a normal to 
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an implicit curve in 2D, the surface normal at a point p on an implicit surface is 
given by the gradient of the implicit function 

.. (df{ p) df( p) df(p)\ 

^ (P) I O ‘ ; 1 : • j I ■ 

\ ox ay oz ) 

The reasoning is the same as for the 2D case: the gradient points in the direction 
of fastest increase in /, which is perpendicular to the direction’s tangent to the 
surface, in which / remains constant. The gradient vector points toward the side 
of the surface where /(p) > 0, which we may think of as “into” the surface or 
“out from” the surface in a given context. If the particular form of / creates inward 
facing gradients and outward facing gradients are desired, the surface —/(p) = 0 
is the same as surface /(p) = 0 but has directionally reversed gradients, i.e., 
-V/(p) = V(-/(p)). 


2.5.5 Implicit Planes 


As an example, consider the infinite plane through point a with surface normal n. 
The implicit equation to describe this plane is given by 

(p - a) ■ n = 0. (2.22) 

Note that a and n are known quantities. The point p is any unknown point that 
satisfies the equation. In geometric terms this equation says “the vector from a to 
p is perpendicular to the plane normal.” If p were not in the plane, then (p — a) 
would not make a right angle with n (Figure 2.33). 

Sometimes we want the implicit equation for a plane through points a, b, 
and c. The normal to this plane can be found by taking the cross product of any 
two vectors in the plane. One such cross product is 

n = (b — a) x (c — a). 

This allows us to write the implicit plane equation: 

(p — a) • ((b — a) x (c — a)) = 0. (2.23) 


A geometric way to read this equation is that the volume of the parallelepiped 
defined by p — a, b — a, and c — a is zero, i.e., they are coplanar. This can 
only be true if p is in the same plane as a, b, and c. The full-blown Cartesian 
representation for this is given by the determinant (this is discussed in more detail 
in Section 5.3): 

x - x a y -y a z - z a 


Xb -X a Vb- Va 


Zb - Za 


= 0 . 


X c x a y c y a z c z a 



P 


Figure 2.33. Any of the 
points p shown are in the 
plane with normal vector 
n that includes point a if 
Equation (2.2) is satisfied. 


(2.24) 
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The determinant can be expanded (see Section 5.3 for the mechanics of expanding 
determinants) to the bloated form with many terms. 

Equations (2.23) and (2.24) are equivalent, and comparing them is instruc¬ 
tive. Equation (2.23) is easy to interpret geometrically and will yield efficient 
code. In addition, it is relatively easy to avoid a typographic error that compiles 
into incorrect code if it takes advantage of debugged cross and dot product code. 
Equation (2.24) is also easy to interpret geometrically and will be efficient pro¬ 
vided an efficient 3x3 determinant function is implemented. It is also easy to 
implement without a typo if a function determinant (a, b, c) is available. It will 
be especially easy for others to read your code if you rename the determinant 
function volume. So both Equations (2.23) and (2.24) map well into code. The 
full expansion of either equation into x-,y-, and ^-components is likely to gener¬ 
ate typos. Such typos are likely to compile and, thus, be especially pesky. This 
is an excellent example of clean math generating clean code and bloated math 
generating bloated code. 

3D Quadric Surfaces 

Just as quadratic polynomials in two variables define quadric curves in 2D, quadratic 
polynomials in x, y, and z define quadric surfaces in 3D. For instance, a sphere 
can be written as 

/(P) = (P - c) 2 - r 2 = 0, 

and an axis-aligned ellipsoid may be written as 

( £ ^ + («£ + ( i -^_l = 0 

a z b* c* 

3D Curves from Implicit Surfaces 

One might hope that an implicit 3D curve could be created with the form /(p) = 

0. However, all such curves are just degenerate surfaces and are rarely useful in 
practice. A 3D curve can be constructed from the intersection of two simultaneous 
implicit equations: 


/(P) = 0 , 

g( p) = o. 


For example, a 3D line can be formed from the intersection of two implicit planes. 
Typically, it is more convenient to use parametric curves instead; they are dis¬ 
cussed in the following sections. 
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2.5.6 2D Parametric Curves 


A parametric curve is controlled by a single parameter that can be considered a 
sort of index that moves continuously along the curve. Such curves have the form 


X 


9(t) 

y. 


pit). 


Here (x, y ) is a point on the curve, and t is the parameter that influences the curve. 
For a given t, there will be some point determined by the functions g and h. For 
continuous g and h, a small change in t will yield a small change in x and y. 
Thus, as t continuously changes, points are swept out in a continuous curve. This 
is a nice feature because we can use the parameter t to explicitly construct points 
on the curve. Often we can write a parametric curve in vector form, 


P = f(t), 


where / is a vector-valued function, / : R h l 2 . Such vector functions can 
generate very clean code, so they should be used when possible. 

We can think of the curve with a position as a function of time. The curve 
can go anywhere and could loop and cross itself. We can also think of the curve 
as having a velocity at any point. For example, the point p(f) is traveling slowly 
near t = — 2 and quickly between t = 2 and t = 3. This type of “moving point” 
vocabulary is often used when discussing parametric curves even when the curve 
is not describing a moving point. 

2D Parametric Lines 

A parametric line in 2D that passes through points p 0 = (.X'o, t/o) and pj = 
(xi : yi) can be written 


X 


x 0 + t(x 1 - Xo) 

y. 


_yo + t(yi - 2 / 0 )_ 


Because the formulas for x and y have such similar structure, we can use the 
vector form for p = (x, y) (Figure 2.34): 

P0) =Po + i(Pi -Po)- 

You can read this in geometric form as: “start at point p 0 and go some distance 
toward pj determined by the parameter t.” A nice feature of this form is that 
p(0) = p 0 and p(l) = p L . Since the point changes linearly with t, the value of 
t between p 0 and p , measures the fractional distance between the points. Points 



Figure 2.34. A 2D para¬ 
metric line through p 0 and 
The line segment de¬ 
fined by f e [0,1] is shown 
in bold. 
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with t < 0 are to the “far” side of p 0 , and points with /; > 1 are to the “far” side 
of Pi. 

Parametric lines can also be described as just a point o and a vector d: 

p(f) = o + t(d). 

When the vector d has unit length, the line is arc-length parameterized. This 
means t is an exact measure of distance along the line. Any parametric curve can 
be arc-length parameterized, which is obviously a very convenient form, but not 
all can be converted analytically. 

2D Parametric Circles 

A circle with center (x c , y c ) and radius r has a parametric form: 


X 


x c + r cos <j) 

y. 


y c + r sin <j> 


To ensure that there is a unique parameter 0 for every point on the curve, we can 
restrict its domain: </> £ [0, 27 t) or cf> £ (— tt, it\ or any other half open interval of 
length 2w. 

An axis-aligned ellipse can be constructed by scaling the x and y parametric 
equations separately: 


X 


x c + a cos 4> 

y. 


y c + b sin <f> 


2.5.7 3D Parametric Curves 

A 3D parametric curve operates much like a 2D parametric curve: 

x = /(f), 
y = g( f), 

z = h(t). 

For example, a spiral around the z-axis is written as: 

x = cos t, 
y = sin t, 
z = t. 
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As with 2D curves, the functions f,g , and h are defined on a domain D C R if 
we want to control where the curve starts and ends. In vector form we can write 


x 

y 


= P(t)- 


z 


In this chapter we only discuss 3D parametric lines in detail. General 3D 
parametric curves are discussed more extensively in Chapter 15. 


The parametric curve is the 
range of p: R —> R 3 . 


3D Parametric Lines 

A 3D parametric line can be written as a straightforward extension of the 2D 
parametric line, e.g.. 


x = 2 + 7t, 

y = 1 + 2t, 
z = 3-5t. 

This is cumbersome and does not translate well to code variables, so we will write 
it in vector form: 

p = o + td, 

where, for this example, o and d are given by 

o=(2,l, 3), 
d = (7,2, —5). 

Note that this is very similar to the 2D case. The way to visualize this is to 
imagine that the line passes though o and is parallel to d. Given any value of t, 
you get some point p(t) on the line. For example, at t = 2, p(t) = (2,1, 3) + 
2(7, 2, —5) = (16, 5, —7). This general concept is the same as for two dimensions 
(Figure 2.30). 

As in 2D, a line segment can be described by a 3D parametric line and an 
interval t € \t a , f/, |. The line segment between two points a and b is given by 
p(i) = a + f (b — a) with t £ [0,1]. Here p(0) = a, p(l) = b, and p(0.5) = 
(a + b)/2, the midpoint between a and b. 

A ray, or half-line, is a 3D parametric line with a half-open interval, usu¬ 
ally [0,oo). From now on we will refer to all lines, line segments, and rays 
as “rays.” This is sloppy, but corresponds to common usage and makes the 
discussion simpler. 
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2.5.8 3D Parametric Surfaces 

The parametric approach can be used to define surfaces in 3D space in much the 
same way we define curves, except that there are two parameters to address the 
two-dimensional area of the surface. These surfaces have the form 

X = f(u,v), 

y = g(u,v), 
z = h(u, v). 


The parametric surface is or, in vector form, 
the range of the function p: 

R 2 —> R 3 . 


X 

y 

z 


= P (u,v). 


Pretend for the sake of ar¬ 
gument that the Earth is ex¬ 
actly spherical. 

The 6 and <f> here may 
or may not seem reversed 
depending on your back¬ 
ground; the use of these 
symbols varies across dis¬ 
ciplines. In this book we will 
always assume the mean¬ 
ing of 9 and <f> used in 
Equation (2.25) and de¬ 
picted in Figure 2.35. 



ometry for spherical coordi¬ 
nates. 


Example. For example, a point on the surface of the Earth can be described by the 
two parameters longitude and latitude. If we define the origin to be at the center of 
the earth, and let r be the radius of the Earth, then a spherical coordinate system 
centered at the origin (Figure 2.35), lets us derive the parametric equations 

x = r cos (f>sm 9, 

y = r sin cj> sin 9, (2.25) 

z = r cos 9. 


Ideally, we’d like to write this in vector form, but it isn’t feasible for this particular 
parametric form. 

We would also like to be able to find the ( 9 , <j>) for a given (x. y , z). If we 
assume that <f> £ (— it, 7t] this is easy to do using the atan2 function from Equa¬ 
tion (2.2); 


9 = acos(z/\/x 2 + y 2 + z 2 ), 
(j) = atan2(y, x). 


(2.26) 

□i 


With implicit surfaces, the derivative of the function / gave us the surface 
normal. With parametric surfaces, the derivatives of p also give information about 
the surface geometry. 

Consider the function q(t) = p(i,uo). This function defines a parametric 
curve obtained by varying u while holding v fixed at the value vq. This curve, 
called an isoparametric curve (or sometimes “isoparm” for short) lies in the sur¬ 
face. The derivative of q gives a vector tangent to the curve, and since the curve 
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lies in the surface the vector q' also lies in the surface. Since it was obtained by 
varying one argument of p, the vector q' is the partial derivative of p with respect 
to u, which we’ll denote p„ . A similar argument shows that the partial derivative 
p v gives the tangent to the isoparametric curves for constant u, which is a second 
tangent vector to the surface. 

The derivative of p, then, gives two tangent vectors at any point on the sur¬ 
face. The normal to the surface may be found by taking the cross product of 
these vectors: since both are tangent to the surface, their cross product, which is 
perpendicular to both tangents, is normal to the surface. The right-hand rule for 
cross products provides a way to decide which side is the front, or outside, of the 
surface; we will use the convention that the vector 

n=P„xp, 

points toward the outside of the surface. 


2.5.9 Summary of Curves and Surfaces 

Implicit curves in 2D or surfaces in 3D are defined by scalar-valued functions of 
two or three variables, / : R 2 —> R. or / : R 3 —> R, and the surface consists of all 
points where the function is zero: 

S = {p |/(p) = 0}. 

Parametric curves in 2D or 3D are defined by vector-valued functions of one vari¬ 
able, p:Dcl-tK 2 orp:flcl-»l 3 , and the curve is swept out as t 
varies over all of D: 

S = { P (t)\teD}. 

Parametric surfaces in 3D are defined by vector-valued functions of two variables, 
p : D C R 2 — » R 3 , and the surface consists of the images of all points (u, v ) in 
the domain: 

S = {p(f) |(u,t>) G D}. 

For implicit curves and surfaces, the normal vector is given by the derivative 
of / (the gradient), and the tangent vector (for a curve) or vectors (for a surface) 
can be derived from the normal by constructing a basis. 

For parametric curves and surfaces, the derivative of p gives the tangent vector 
(for a curve) or vectors (for a surface), and the normal vector can be derived from 
the tangents by constructing a basis. 
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2.6 Linear Interpolation 

Perhaps the most common mathematical operation in graphics is linear interpo¬ 
lation. We have already seen an example of linear interpolation of position to 
form line segments in 2D and 3D, where two points a and b are associated with 
a parameter t to form the line p = (1 — t) a + / b. This is interpolation because p 
goes through a and b exactly at t = 0 and t = 1. It is linear interpolation because 
the weighting terms t and 1 — t are linear polynomials of t. 

Another common linear interpolation is among a set of positions on the x- 
axis: Xq , x \,..., x n , and for each we have an associated height, yi . We want to 
create a continuous function y = f(x) that interpolates these positions, so that / 
goes through every data point, i.e., f(xi) = yi. For linear interpolation, the points 
(xi,yi) are connected by straight line segments. It is natural to use parametric 
line equations for these segments. The parameter t is just the fractional distance 
between Xi and Xi + \. 


f(x) = Vi + ——— (yi+i - Vi)- (2.27) 

Because the weighting functions are linear polynomials of x, this is linear inter¬ 
polation. 

The two examples above have the common form of linear interpolation. We 
create a variable t that varies from 0 to 1 as we move from data item A to data 
item B. Intermediate values are just the function (1 — t)A + tB. Notice that 
Equation (2.27) has this form with 

x - Xi 
Xi-\.\ Xi 


2.7 Triangles 

Triangles in both 2D and 3D are the fundamental modeling primitive in many 
graphics programs. Often information such as color is tagged onto triangle ver¬ 
tices, and this information is interpolated across the triangle. The coordinate sys¬ 
tem that makes such interpolation straightforward is called barycentric coordi¬ 
nates ; we will develop these from scratch. We will also discuss 2D triangles, 
which must be understood before we can draw their pictures on 2D screens. 
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2.7.1 2D Triangles 

If we have a 2D triangle defined by 2D points a, b, and c, we can first find its 
area: 



area = - 


2 yb- Da Dc — Va 

| {xaVb + x b y c + x c y a - x a y c - x b y a - x c y b ). 


(2.28) 


The derivation of this formula can be found in Section 5.3. This area will have a 
positive sign if the points a, b, and c are in counterclockwise order and a negative 
sign, otherwise. 

Often in graphics, we wish to assign a property, such as color, at each trian¬ 
gle vertex and smoothly interpolate the value of that property across the triangle. 
There are a variety of ways to do this, but the simplest is to use barycentric co¬ 
ordinates. One way to think of barycentric coordinates is as a non-orthogonal 
coordinate system as was discussed briefly in Section 2.4.2. Such a coordinate 
system is shown in Figure 2.36, where the coordinate origin is a and the vectors 
from a to b and c are the basis vectors. With that origin and those basis vectors, 
any point p can be written as 


p = a + /3(b - a) + j(c - a). 


(2.29) 



Figure 2.36. A 2D triangle with vertices a, b, c can be used to set up a non-orthogonal 
coordinate system with origin a and basis vectors (b - a) and (c - a). A point is then 
represented by an ordered pair (J3,y). For example, the point p = (2.0, 0.5), i.e., p = a 
+ 2.0 (b - a) + 0.5 (c - a). 
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Note that we can reorder the terms in Equation (2.29) to get 

p = (1 — /3 — 7 )a + /3b + 7 c. 

Often people define a new variable a to improve the symmetry of the equations: 

a= 1-/3 — 7 , 


which yields the equation 


p(a,/3, 7 ) = aa+ /3b + 7 c, (2.30) 

with the constraint that 

a + /3 + 7 =l. (2.31) 

Barycentric coordinates seem like an abstract and unintuitive construct at first, 
but they turn out to be powerful and convenient. You may find it useful to think 
of how street addresses would work in a city where there are two sets of parallel 
streets, but where those sets are not at right angles. The natural system would 
essentially be barycentric coordinates, and you would quickly get used to them. 
Barycentric coordinates are defined for all points on the plane. A particularly nice 
feature of barycentric coordinates is that a point p is inside the triangle formed by 
a, b, and c if and only if 


0 < a < 1 , 

0 </3< 1, 

0 < 7 < 1 . 

If one of the coordinates is zero and the other two are between zero and one, then 
you are on an edge. If two of the coordinates are zero, then the other is one, 
and you are at a vertex. Another nice property of barycentric coordinates is that 
Equation (2.30) in effect mixes the coordinates of the three vertices in a smooth 
way. The same mixing coefficients (a, /3, 7 ) can be used to mix other properties, 
such as color, as we will see in the next chapter. 

Given a point p, how do we compute its barycentric coordinates? One way is 
to write Equation (2.29) as a linear system with unknowns (3 and 7 , solve, and set 
a = 1 — /3 — 7 . That linear system is 

Xb X a X c X a 

yb -y a y c ~ y a 

Although it is straightforward to solve Equation (2.32) algebraically, it is often 
fruitful to compute a direct geometric solution. 



p 




7 


Vp-Va 


(2.32) 








2.7. Triangles 


47 


One geometric property of barycentric coordinates is that they are the signed 
scaled distance from the lines through the triangle sides, as is shown for f3 in 
Figure 2.37. Recall from Section 2.5.2 that evaluating the equation f(x, y) for the 
line f(x, y) = 0 returns the scaled signed distance from (x, y) to the line. Also 
recall that if f{x, y) = 0 is the equation for a particular line, so is kf(x, y) = 0 
for any non-zero k. Changing k scales the distance and controls which side of the 
line has positive signed distance, and which negative. We would like to choose 
k such that, for example, kf(x, y) = (3. Since k is only one unknown, we can 
force this with one constraint, namely that at point b we know (3 = 1. So if the 
line f ac {x, y) = 0 goes through both a and c, then we can compute [3 for a point 
(x, y) as follows: 


0 = 


fac{x : y) 


(2.33) 


fac{Xb,Vb) ’ 

and we can compute 7 and a in a similar fashion. For efficiency, it is usually wise 
to compute only two of the barycentric coordinates directly and to compute the 
third using Equation (2.31). 

To find this “ideal” form for the line through p 0 and we can first use the 
technique of Section 2.5.2 to find some valid implicit lines through the vertices. 
Equation (2.18) gives us 


fab(x , y) = (y a ~ yb)x + (x b - X a )y + x a yb - x b y a = 0. 

Note that f a b{x c , y c ) probably does not equal one, so it is probably not the ideal 
form we seek. By dividing through by f a b{x c , yc) we get 

_ {ya - yb)x + (x b - Xg)y + x a y b - x b y a 
{y a - yb)x c + (x b - x a )y c + x a y h - x b y a ' 

The presence of the division might worry us because it introduces the possibility 
of divide-by-zero, but this cannot occur for triangles with areas that are not near 
zero. There are analogous formulas for a and (3, but typically only one is needed: 

p _ (z/q - y c )x + (x c - Xg)y + x a y c - x c y a 
{y a - y c )x b + {x c - X a )y b + x a y c - x c y a ’ 
a = 1 — (3 — 7 . 

Another way to compute barycentric coordinates is to compute the areas A a , A b , 
and A c , of subtriangles as shown in Figure 2.38. Barycentric coordinates obey 
the rule 



signed scaled distance 
from the line through a 
and c. 



centric coordinates are pro¬ 
portional to the areas of the 
three subtriangles shown. 


a = A a /A, 
/3 = A b /A, 
7 = A c /A, 


(2.34) 
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Figure 2.39. The area of 
the two triangles shown is 
base times height and are 
thus the same, as is any tri¬ 
angle with a vertex on the 
0 = 0.5 line. The height 
and thus the area is propor¬ 
tional to 0. 


where A is the area of the triangle. Note that A = A a + Ab + A c , so it can be 
computed with two additions rather than a full area formula. This rule still holds 
for points outside the triangle if the areas are allowed to be signed. The reason 
for this is shown in Figure 2.39. Note that these are signed areas and will be 
computed correctly as long as the same signed area computation is used for both 
A and the subtriangles A a , Ab, and A c . 


2.7.2 3D Triangles 

One wonderful thing about barycentric coordinates is that they extend almost 
transparently to 3D. If we assume the points a, b, and c are 3D, then we can 
still use the representation 


p = (1 - ft — 7 )a + /3b + jc. 



Figure 2.40. The nor¬ 
mal vector of the triangle is 
perpendicular to all vectors 
in the plane of the triangle, 
and thus perpendicular to 
the edges of the triangle. 


Now, as we vary f3 and 7, we sweep out a plane. 

The normal vector to a triangle can be found by taking the cross product of 
any two vectors in the plane of the triangle (Figure 2.40). It is easiest to use two 
of the three edges as these vectors, for example, 

n = (b — a) x (c — a). (2.35) 

Note that this normal vector is not necessarily of unit length, and it obeys the 
right-hand rule of cross products. 

The area of the triangle can be found by taking the length of the cross product: 

area = -||(b — a) x (c — a)||. (2.36) 

Note that this is not a signed area, so it cannot be used directly to evaluate barycen¬ 
tric coordinates. However, we can observe that a triangle with a “clockwise” ver¬ 
tex order will have a normal vector that points in the opposite direction to the 
normal of a triangle in the same plane with a “counterclockwise” vertex order. 
Recall that 


a b = ||a|| ||b|| cos 

where 0 is the angle between the vectors. If a and b are parallel, then cos (t> = ±1, 
and this gives a test of whether the vectors point in the same or opposite directions. 
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This, along with Equations (2.34), (2.35), and (2.36) suggest the formulas 


a = 


P = 


n ■ n b 

IInil 2 ’ 


n • n c 


where n is Equation (2.35) evaluated with vertices a, b, and c; n a is Equa¬ 
tion (2.35) evaluated with vertices b, c, and p, and so on, i.e., 

n a = (c - b) x (p - b), 

n b = (a - c) x (p - c), (2.37) 

n c = (b - a) x (p - a). 


Frequently Asked Questions 

• Why isn’t there vector division? 

It turns out that there is no “nice” analogy of division for vectors. However, it 
is possible to motivate the quaternions by examining this questions in detail (see 
Hoffman’s book referenced in the chapter notes). 

• Is there something as clean as barycentric coordinates for polygons with 
more than three sides? 

Unfortunately there is not. Even convex quadrilaterals are much more compli¬ 
cated. This is one reason triangles are such a common geometric primitive in 
graphics. 

• Is there an implicit form for 3D lines? 

No. However, the intersection of two 3D planes defines a 3D line, so a 3D line 
can be described by two simultaneous implicit 3D equations. 
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Notes 

The history of vector analysis is particularly interesting. It was largely invented 
by Grassman in the mid-1800s but was ignored and reinvented later (Crowe, 
1994). Grassman now has a following in the graphics field of researchers who 
are developing Geometric Algebra based on some of his ideas (Doran & Lasenby, 
2003). Readers interested in why the particular scalar and vector products are 
in some sense the right ones, and why we do not have a commonly-used vector 
division, will find enlightenment in the concise About Vectors (Hoffmann, 1975). 
Another important geometric tool is the quaternion invented by Hamilton in the 
mid-1800s. Quaternions are useful in many situations, but especially where ori¬ 
entations are concerned (Hanson, 2005). 


Exercises 

1. The cardinality of a set is the number of elements it contains. Under IEEE 
floating point representation (Section 1.5), what is the cardinality of the 
floats! 

2. Is it possible to implement a function that maps 32-bit integers to 64-bit in¬ 
tegers that has a well-defined inverse? Do all functions from 32-bit integers 
to 64-bit integers have well-defined inverses? 

3. Specify the unit cube (x-, y-, and ^-coordinates all between 0 and 1 inclu¬ 
sive) in terms of the Cartesian product of three intervals. 

4. If you have access to the natural log function ln(a;), specify how you could 
use it to implement a log(fo, x ) function where b is the base of the log. What 
should the function do for negative b values? Assume an IEEE floating 
point implementation. 

5. Solve the quadratic equation 2x 2 + 6x + 4 = 0. 

6 . Implement a function that takes in coefficients A, B, and C for the quadratic 
equation Ax 2 + By + C = 0 and computes the two solutions. Have the 
function return the number of valid (not NaN) solutions and fill in the return 
arguments so the smaller of the two solutions is first. 

7. Show that the two forms of the quadratic formula on page 17 are equivalent 
(assuming exact arithmetic) and explain how to choose one for each root in 
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order to avoid subtracting nearly equal floating point numbers, which leads 
to loss of precision. 

8 . Show by counterexample that it is not always true that for 3D vectors a, b, 
and c, a x (b x c) = (a x b) x c. 

9. Given the non-parallel 3D vectors a and b, compute a right-handed or¬ 
thonormal basis such that u is parallel to a and v is in the the plane defined 
by a and b. 

10. What is the gradient of f(x, y,z) = x 2 + y — 3z 3 ? 

11. What is a parametric form for the axis-aligned 2D ellipse? 

12. What is the implicit equation of the plane through 3D points (1,0,0), 
(0,1,0), and (0, 0,1)? What is the parametric equation? What is the nor¬ 
mal vector to this plane? 

13. Given four 2D points ao, ai, bo, and bi, design a robust procedure to 
determine whether the line segments aoai and bobi intersect. 

14. Design a robust procedure to compute the barycentric coordinates of a 2D 
point with respect to three 2D non-collinear points. 







CO 






Raster Images 


Most computer graphics images are presented to the user on some kind of raster 
display. Raster displays show images as rectangular arrays of pixels. A common 
example is a flat-panel computer display or television, which has a rectangular 
array of small light-emitting pixels that can individually be set to different colors 
to create any desired image. Different colors are achieved by mixing varying 
intensities of red, green, and blue light. Most printers, such as laser printers and 
ink-jet printers, are also raster devices. They are based on scanning: there is no 
physical grid of pixels, but the image is laid down sequentially by depositing ink 
at selected points on a grid. 

Rasters are also prevalent in input devices for images. A digital camera con¬ 
tains an image sensor comprising a grid of light-sensitive pixels, each of which 
records the color and intensity of light falling on it. A desktop scanner contains a 
linear array of pixels that is swept across the page being scanned, making many 
measurements per second to produce a grid of pixels. 

Because rasters are so prevalent in devices, raster images are the most com¬ 
mon way to store and process images. A raster image is simply a 2D array that 
stores the pixel value for each pixel—usually a color stored as three numbers, for 
red, green, and blue. A raster image stored in memory can be displayed by using 
each pixel in the stored image to control the color of one pixel of the display. 

But we don't always want to display an image this way. We might want to 
change the size or orientation of the image, correct the colors, or even show the 
image pasted on a moving three-dimensional surface. Even in televisions, the dis¬ 
play rarely has the same number of pixels as the image being displayed. Consid- 


Pixel is short for “picture el¬ 
ement.” 


Color in printers is more 
complicated, involving mix¬ 
tures of at least four pig¬ 
ments. 


Or maybe it’s because 
raster images are so con¬ 
venient that raster devices 
are prevalent. 
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Or: you have to know what 
those numbers in your im¬ 
age actually mean. 

erations like these break the direct link between image pixels and display pixels. 
It’s best to think of a raster image as a device-independent description of the im¬ 
age to be displayed, and the display device as a way of approximating that ideal 
image. 

There are other ways of describing images besides using arrays of pixels. 
A vector image is described by storing descriptions of shapes —areas of color 
bounded by lines or curves—with no reference to any particular pixel grid. In 
essence this amounts to storing the instructions for displaying the image rather 
than the pixels needed to display it. The main advantage of vector images is that 
they are resolution independent and can be displayed well on very high resolution 
devices. The corresponding disadvantage is that they must be rasterized before 
they can be displayed. Vector images are often used for text, diagrams, mechani¬ 
cal drawings, and other applications where crispness and precision are important 
and photographic images and complex shading aren’t needed. 

In this chapter, we discuss the basics of raster images and displays, paying 
particular attention to the nonlinearities of standard displays. The details of how 
pixel values relate to light intensities are important to have in mind when we 
discuss computing images in later chapters. 


3.1 Raster Devices 

Before discussing raster images in the abstract, it is instructive to look at the basic 
operation of some specific devices that use these images. A few familiar raster 
devices can be categorized into a simple hierarchy: 

• Output 

- Display 

* Transmissive: liquid crystal display (LCD) 

* Emissive: light emitting diode (LED) display 

- Hardcopy 

* Binary: ink-jet printer 

* Continuous tone: dye sublimation printer 

• Input 

- 2D array sensor: digital camera 

- ID array sensor: flatbed scanner 
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3.1.1 Displays 

Current displays, including televisions and digital cinematic projectors as well as 
displays and projectors for computers, are nearly universally based on fixed arrays 
of pixels. They can be separated into emissive displays, which use pixels that 
directly emit controllable amounts of light, and transmissive displays,in which 
the pixels themselves don’t emit light but instead vary the amount of light that 
they allow to pass through them. Transmissive displays require a light source to 
illuminate them: in a direct-viewed display this is a backlight behind the array; 
in a projector it is a lamp that emits light that is projected onto the screen after 
passing through the array. An emissive display is its own light source. 

Light-emitting diode (LED) displays are an example of the emissive type. 
Each pixel is composed of one or more LEDs, which are semiconductor devices 
(based on inorganic or organic semiconductors) that emit light with intensity de¬ 
pending on the electrical current passing through them (see Figure 3.1). 

The pixels in a color display are divided into three independently controlled 
subpixels —one red, one green, and one blue—each with its own LED made us¬ 
ing different materials so that they emit light of different colors (Figure 3.2). 



h 

cathodes 


© 

© 


Figure 3.1. The opera¬ 
tion of a light-emitting diode 
(LED) display. 



Figure 3.2. The red, 
green, and blue subpixels 
within a pixel of a flat-panel 
display. 
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Figure 3.3. One pixel of an LCD display in the off state (bottom), in which the front polarizer 
blocks all the light that passes the back polarizer, and the on state (top), in which the liquid 
crystal cell rotates the polarization of the light so that it can pass through the front polarizer. 
Figure courtesy Erik Reinhard (Reinhard et al., 2008). 
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polarizer^ 

liquid 
crystal 
polarizer / oiTc 
backlight 


Figure 3.4. The opera¬ 
tion of a liquid crystal dis¬ 
play (LCD). 


The resolution of a dis¬ 
play is sometimes called 
its “native resolution” since 
most displays can handle 
images of other resolutions, 
via built-in conversion. 


When the display is viewed from a distance, the eye can’t separate the individual 
subpixels, and the perceived color is a mixture of red, green, and blue. 

Liquid crystal displays (LCDs) are an example of the transmissive type. A 
liquid crystal is a material whose molecular structure enables it to rotate the po¬ 
larization of light that passes through it, and the degree of rotation can be adjusted 
by an applied voltage. An LCD pixel (Figure 3.3) has a layer of polarizing film 
behind it, so that it is illuminated by polarized light—let’s assume it is polarized 
horizontally. 

A second layer of polarizing film in front of the pixel is oriented to trans¬ 
mit only vertically polarized light. If the applied voltage is set so that the liquid 
crystal layer in between does not change the polarization, all light is blocked and 
the pixel is in the “off” (minimum intensity) state. If the voltage is set so that 
the liquid crystal rotates the polarization by 90 degrees, then all the light that en¬ 
tered through the back of the pixel will escape through the front, and the pixel 
is fully “on”—it has its maximum intensity. Intermediate voltages will partly 
rotate the polarization so that the front polarizer partly blocks the light, result¬ 
ing in intensities between the minimum and maximum (Figure 3.4). Like color 
LED displays, color LCDs have red, green, and blue subpixels within each pixel, 
which are three independent pixels with red, green, and blue color filters over 
them. 

Any type of display with a fixed pixel grid, including these and other tech¬ 
nologies, has a fundamentally fixed resolution determined by the size of the grid. 
For displays and images, resolution simply means the dimensions of the pixel 
grid: if a desktop monitor has a resolution of 1920 x 1200 pixels, this means that 
it has 2,304,000 pixels arranged in 1920 columns and 1200 rows. 

An image of a different resolution, to fill the screen, must be converted into a 
1920 x 1200 image using the methods of Chapter 9. 


3.1.2 Hardcopy Devices 



Figure 3.5. The operation 
of an ink-jet printer. 


The process of recording images permanently on paper has very different con¬ 
straints from showing images transiently on a display. In printing, pigments are 
distributed on paper or another medium so that when light reflects from the pa¬ 
per it forms the desired image. Printers are raster devices like displays, but many 
printers can only print binary images —pigment is either deposited or not at each 
grid position, with no intermediate amounts possible. 

An ink-jet printer (Figure 3.5) is an example of a device that forms a raster 
image by scanning. An ink-jet print head contains liquid ink carrying pigment, 
which can be sprayed in very small drops under electronic control. The head 
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moves across the paper, and drops are emitted as it passes grid positions that 
should receive ink; no ink is emitted in areas intended to remain blank. After 
each sweep the paper is advanced slightly, and then the next row of the grid is laid 
down. Color prints are made by using several print heads, each spraying ink with a 
different pigment, so that each grid position can receive any combination of differ¬ 
ent colored drops. Because all drops are the same, an ink-jet printer prints binary 
images: at each grid point there is a drop or no drop; there are no intermediate 
shades. 

An ink-jet printer has no physical array of pixels; the resolution is deter¬ 
mined by how small the drops can be made and how far the paper is advanced 
after each sweep. Many ink-jet printers have multiple nozzles in the print head, 
enabling several sweeps to be made in one pass, but it is the paper advance, 
not the nozzle spacing, that ultimately determines the spacing of the rows. 

The thermal dye transfer process is an example of a continuous tone printing 
process, meaning that varying amounts of dye can be deposited at each pixel—it 
is not all-or-nothing like an ink-jet printer (Figure 3.6). A donor ribbon contain¬ 
ing colored dye is pressed between the paper, or dye receiver, and a print head 
containing a linear array of heating elements, one for each column of pixels in the 
image. As the paper and ribbon move past the head, the heating elements switch 
on and off to heat the ribbon in areas where dye is desired, causing the dye to dif¬ 
fuse from the ribbon to the paper. This process is repeated for each of several dye 
colors. Since higher temperatures cause more dye to be transferred, the amount of 
each dye deposited at each grid position can be controlled, allowing a continuous 
range of colors to be produced. The number of heating elements in the print head 
establishes a fixed resolution in the direction across the page, but the resolution 
along the page is determined by the rate of heating and cooling compared to the 
speed of the paper. 

Unlike displays, the resolution of printers is described in terms of the pixel 
density instead of the total count of pixels. So a thermal dye transfer printer that 
has elements spaced 300 per inch across its print head has a resolution of 300 
pixels per inch (ppi) across the page. If the resolution along the page is chosen 
to be the same we can simply say the printer’s resolution is 300 ppi. An ink-jet 
printer that places dots on a grid with 1200 grid points per inch is described as 
having a resolution of 1200 dots per inch (dpi). Because the ink-jet printer is a 
binary device, it requires a much finer grid for at least two reasons. Because edges 
are abrupt black/white boundaries, very high resolution is required to avoid stair¬ 
stepping, or aliasing, from appearing (see Section 8.3). When continuous-tone 
images are printed, the high resolution is required to simulate intermediate colors 
by printing varying-density dot patterns called halftones. 


There are also continuous 
ink-jet printers that print in 
a continuous helical path 
on paper wrapped around a 
spinning drum, rather than 
moving the head back and 
forth. 


donor 



Figure 3.6. The opera¬ 
tion of a thermal dye trans¬ 
fer printer. 


The term “dpi” is all too of¬ 
ten used to mean “pixels 
per inch,” but dpi should 
be used in reference to bi¬ 
nary devices and ppi in ref¬ 
erence to continuous-tone 
devices. 
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scene 



Figure 3.7. The operation 
of a digital camera. 
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Figure 3.8. Most color 
digital cameras use a color- 
filter array similar to the 
Bayer mosaic shown here. 
Each pixel measures either 
red, green, or blue light. 


People who are selling 
cameras use "mega 11 to 
mean 10 s , not 2 20 as with 
megabytes. 


The resolution of a scanner 
is sometimes called its “op¬ 
tical resolution” since most 
scanners can produce im¬ 
ages of other resolutions, 
via built-in conversion. 


3.1.3 Input Devices 

Raster images have to come from somewhere, and any image that wasn't com¬ 
puted by some algorithm has to have been measured by some raster input device, 
most often a camera or scanner. Even in rendering images of 3D scenes, pho¬ 
tographs are used constantly as texture maps (see Chapter 11). A raster input 
device has to make a light measurement for each pixel, and (like output devices) 
they are usually based on arrays of sensors. 

A digital camera is an example of a 2D array input device. The image sensor 
in a camera is a semiconductor device with a grid of light-sensitive pixels. Two 
common types of arrays are known as CCDs (charge-coupled devices) and CMOS 
(complimentary metal-oxide-semiconductor) image sensors. The camera’s lens 
projects an image of the scene to be photographed onto the sensor, and then each 
pixel measures the light energy falling on it, ultimately resulting in a number that 
goes into the output image (Figure 3.7). In much the same way as color displays 
use red, green, and blue subpixels, most color cameras work by using a color-filter 
array or mosaic to allow each pixel to see only red, green, or blue light, leaving 
the image processing software to fill in the missing values in a process known as 
demosaicking (Figure 3.8). 

Other cameras use three separate arrays, or three separate layers in the array, to 
measure independent red, green, and blue values at each pixel, producing a usable 
color image without further processing. The resolution of a camera is determined 
by the fixed number of pixels in the array and is usually quoted using the total 
count of pixels: a camera with an array of 3000 columns and 2000 rows produces 
an image of resolution 3000 x 2000, which has 6 million pixels, and is called a 
6 megapixel (MP) camera. It’s important to remember that a mosiac sensor does 
not measure a complete color image, so a camera that measures the same number 
of pixels but with independent red, green, and blue measurements records more 
information about the image than one with a mosaic sensor. 

A flatbed scanner also measures red, green, and blue values for each of a grid 
of pixels, but like a thermal dye transfer printer it uses a ID array that sweeps 
across the page being scanned, making many measurements per second. The 
resolution across the page is fixed by the size of the array, and the resolution 
along the page is determined by the frequency of measurements compared to the 
speed at which the scan head moves. A color scanner has a 3 x n x array, where 
n x is the number of pixels across the page, with the three rows covered by red, 
green, and blue filters. With an appropriate delay between the times at which the 
three colors are measured, this allows three independent color measurements at 
each grid point. As with continuous-tone printers, the resolution of scanners is 
reported in pixels per inch (ppi). 
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With this concrete information about where our images come from and where 
they will go, we'll now discuss images more abstractly, in the way we’ll use them 
in graphics algorithms. 


3.2 Images, Pixels, and Geometry 

We know that a raster image is a big array of pixels, each of which stores informa¬ 
tion about the color of the image at its grid point. We’ve seen what various output 
devices do with images we send to them and how input devices derive them from 
images formed by light in the physical world. But for computations in the com¬ 
puter we need a convenient abstraction that is independent of the specifics of any 
device, that we can use to reason about how to produce or interpret the values 
stored in images. 

When we measure or reproduce images, they take the form of two-dimensional 
distributions of light energy: the light emitted from the monitor as a function of 
position on the face of the display; the light falling on a camera’s image sensor 
as a function of position across the sensor’s plane; the reflectance, or fraction of 
light reflected (as opposed to absorbed) as a function of position on a piece of pa¬ 
per. So in the physical world, images are functions defined over two-dimensional 
areas —almost always rectangles. So we can abstract an image as a function 

I(x,y):R^V, 

where R C M 2 is a rectangular area and V is the set of possible pixel values. The 
simplest case is an idealized grayscale image where each point in the rectangle 
has just a brightness (no color), and we can say V = R + (the non-negative reals). 
An idealized color image, with red, green, and blue values at each pixel, has 
V = (R + ) 3 . We’ll discuss other possibilities for V in the next section. 

How does a raster image relate to this abstract notion of a continuous image? 
Looking to the concrete examples, a pixel from a camera or scanner is a measure¬ 
ment of the average color of the image over some small area around the pixel. A 
display pixel, with its red, green, and blue subpixels, is designed so that the aver¬ 
age color of the image over the face of the pixel is controlled by the corresponding 
pixel value in the raster image. In both cases, the pixel value is a local average 
of the color of the image, and it is called a point sample of the image. In other 
words, when we find the value * in a pixel, it means “the value of the image in the 
vicinity of this grid point is x.” The idea of images as sampled representations of 
functions is explored further in Chapter 9. 

A mundane but important question is where the pixels are located in 2D space. 
This is only a matter of convention, but establishing a consistent convention is 



Figure 3.9. The operation 
of a flatbed scanner. 


“A pixel is not a little 
square!” 

—Alvy Ray Smith (A. R. 
Smith, 1995) 


Are there any raster de¬ 
vices that are not rectanqu- 
lar? 





60 


3. Raster Images 




y 

i/ O G 


c 

\o,2) 

o 

o 

°(3,2) 

J 

X 

( 

\o.D 

o 

o 

O 






x= 


(0.0) 

°(10) 

°(2.0) 

°(3,0) 

y—n 5 

0.5 

X 

= 3.5 


Figure 3.10. Coordinates of a four pixel X three pixel screen. Note that in some APIs the 
y-axis will point downwards. 


In some APIs, and many 
file formats, the rows of an 
image are organized top-to- 
bottom, so that (0, 0) is at 
the top left. This is for his¬ 
torical reasons: the rows in 
analog television transmis¬ 
sion started from the top. 


Some systems shift the co¬ 
ordinates by half a pixel 
to place the sample points 
halfway between the inte¬ 
gers but place the edges of 
the image at integers. 


important! In this book, a raster image is indexed by the pair (i,j) indicating the 
column ( i ) and row (j) of the pixel, counting from the bottom left. If an image 
has n x columns and n y rows of pixels, the bottom-left pixel is (0,0) and the top- 
right is pixel (n x — 1, n y — 1). We need 2D real screen coordinates to specify 
pixel positions. We will place the pixels’ sample points at integer coordinates, as 
shown by the 4x3 screen in Figure 3.10. 

The rectangular domain of the image has width n x and height n y and is cen¬ 
tered on this grid, meaning that it extends half a pixel beyond the last sample point 
on each side. So the rectangular domain of a n x x n y image is 

R = [—0.5, n x — 0.5] x [—0.5, n y — 0.5]. 

Again, these coordinates are simply conventions, but they will be important 
to remember later when implementing cameras and viewing transformations. 


3.2.1 Pixel Values 

So far we have described the values of pixels in terms of real numbers, represent¬ 
ing intensity (possibly separately for red, green, and blue) at a point in the image. 
This suggests that images should be arrays of floating-point numbers, with either 
one (for grayscale, or black and white, images) or three (for RGB color images) 
32-bit floating point numbers stored per pixel. This format is sometimes used, 
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when its precision and range of values are needed, but images have a lot of pix¬ 
els and memory and bandwidth for storing and transmitting images are invariably 
scarce. Just one ten-megapixel photograph would consume about 115 MB of 
RAM in this format. 

Less range is required for images that are meant to be displayed directly. 
While the range of possible light intensities is unbounded in principle, any given 
device has a decidedly finite maximum intensity, so in many contexts it is per¬ 
fectly sufficient for pixels to have a bounded range, usually taken to be [0,1] for 
simplicity. For instance, the possible values in an 8-bit image are 0,1/255, 2/255, 
...,254/255,1. Images stored with floating-point numbers, allowing a wide 
range of values, are often called high dynamic range (HDR) images to distinguish 
them from fixed-range, or low dynamic range (LDR) images that are stored with 
integers. See Chapter 23 for an in-depth discussion of techniques and applications 
for high dynamic range images. 

Here are some pixel formats with typical applications: 


Why 115 MB and not 120 
MB? 


The denominator of 255, 
rather than 256, is awk¬ 
ward, but being able to rep¬ 
resent 0 and 1 exactly is im¬ 
portant. 


• 1-bit grayscale—text and other images where intermediate grays are not 
desired (high resolution required); 

• 8-bit RGB fixed-range color (24 bits total per pixel)—web and email appli¬ 
cations, consumer photographs; 

• 8- or 10-bit fixed-range RGB (24—30 bits/pixel)—digital interfaces to com¬ 
puter displays; 

• 12- to 14-bit fixed-range RGB (36-42 bits/pixel)—raw camera images for 
professional photography; 

• 16-bit fixed-range RGB (48 bits/pixel)—professional photography and print¬ 
ing; intermediate format for image processing of fixed-range images; 

• 16-bit fixed-range grayscale (16 bits/pixel) —radiology and medical imag¬ 
ing; 

• 16-bit “half-precision” floating-point RGB — HDR images; intermediate for¬ 
mat for real-time rendering; 

• 32-bit floating-point RGB—general-purpose intermediate format for soft¬ 
ware rendering and processing of HDR images. 


Reducing the number of bits used to store each pixel leads to two distinc¬ 
tive types of artifacts, or artificially introduced flaws, in images. First, encoding 
images with fixed-range values produces clipping when pixels that would other¬ 
wise be brighter than the maximum value are set, or clipped, to the maximum 
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representable value. For instance, a photograph of a sunny scene may include re¬ 
flections that are much brighter than white surfaces; these will be clipped (even if 
they were measured by the camera) when the image is converted to a fixed range 
to be displayed. Second, encoding images with limited precision leads to quan¬ 
tization artifacts, or banding, when the need to round pixel values to the nearest 
representable value introduces visible jumps in intensity or color. Banding can 
be particularly insidious in animation and video, where the bands may not be 
objectionable in still images but become very visible when they move back and 
forth. 


3.2.2 Monitor Intensities and Gamma 

All modern monitors take digital input for the “value” of a pixel and convert this 
to an intensity level. Real monitors have some non-zero intensity when they are 
off because the screen reflects some light. For our purposes we can consider this 
“black” and the monitor fully on as “white.” We assume a numeric description 
of pixel color that ranges from zero to one. Black is zero, white is one, and a 
gray halfway between black and white is 0.5. Note that here “halfway” refers to 
the physical amount of light coming from the pixel, rather than the appearance. 
The human perception of intensity is non-linear and will not be part of the present 
discussion; see Chapter 22 for more. 

There are two key issues that must be understood to produce correct images 
on monitors. The first is that monitors are non-linear with respect to input. For 
example, if you give a monitor 0, 0.5, and 1.0 as inputs for three pixels, the 
intensities displayed might be 0, 0.25, and 1.0 (off, one-quarter fully on, and 
fully on). As an approximate characterization of this non-linearity, monitors are 
commonly characterized by a 7 (“gamma”) value. This value is the degree of 
freedom in the formula 

displayed intensity = (maximum intensityja 7 , (3.1) 

where a is the input pixel value between zero and one. For example, if a monitor 
has a gamma of 2.0, and we input a value of a = 0.5, the displayed intensity 
will be one fourth the maximum possible intensity because 0.5 2 = 0.25. Note 
that a = 0 maps to zero intensity and a = 1 maps to the maximum intensity 
regardless of the value of 7 . Describing a display’s non-linearity using 7 is only 
an approximation; we do not need a great deal of accuracy in estimating the 7 of 
a device. A nice visual way to gauge the non-linearity is to find what value of a 
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gives an intensity halfway between black and white. This a will be 


0.5 = a 1 . 


If we can find that a, we can deduce 7 by taking logarithms on both sides: 

In 0.5 

7= -j-■ 

in a 


We can find this a by a standard technique where we display a checkerboard 
pattern of black and white pixels next to a square of gray pixels with input a 
(Figure 3.11), then ask the user to adjust a (with a slider, for instance) until the two 
sides match in average brightness. When you look at this image from a distance 
(or without glasses if you are nearsighted), the two sides of the image will look 
about the same when a is producing an intensity halfway between black and white. 
This is because the blurred checkerboard is mixing even numbers of white and 
black pixels so the overall effect is a uniform color halfway between white and 
black. 

Once we know 7, we can gamma correct our input so that a value of a = 0.5 
is displayed with intensity halfway between black and white. This is done with 
the transformation 

/ a 

a = a~i. 


When this formula is plugged into Equation (3.1) we get 

displayed intensity = ( a 1 ) 7 = ^ j (maximum intensity) 
= a(maximum intensity). 



Figure 3.11. Alternat¬ 
ing black and white pixels 
viewed from a distance are 
halfway between black and 
white. The gamma of a 
monitor can be inferred by 
finding a gray value that 
appears to have the same 
intensity as the black and 
white pattern. 


For monitors with analog 
interfaces, which have dif¬ 
ficulty changing intensity 
rapidly along the horizontal 
direction, horizontal black 
and white stripes work bet¬ 
ter than a checkerboard. 


Another important characteristic of real displays is that they take quantized input 
values. So while we can manipulate intensities in the floating point range [0,1], 
the detailed input to a monitor is a fixed-size integer. The most common range for 
this integer is 0-255 which can be held in 8 bits of storage. This means that the 
possible values for a are not any number in [0,1] but instead 

- f 0 1 2 254 255) 

F \ 255’255’255’ ’ 255 255 J 


This means the possible displayed intensity values are approximately 
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In grade school you prob¬ 
ably learned that the pri¬ 
maries are red, yellow, and 
blue, and that, e. g., yel¬ 
low + blue = green. This 
is subtractive color mixing, 
which is fundamentally dif¬ 
ferent from the more famil¬ 
iar additive mixing that hap¬ 
pens in displays. 



Figure 3.12. The addi¬ 
tive mixing rules for colors 
red/green/blue. 


where M is the maximum intensity. In applications where the exact intensities 
need to be controlled, we would have to actually measure the 256 possible inten¬ 
sities, and these intensities might be different at different points on the screen, 
especially for CRTs. They might also vary with viewing angle. Fortunately few 
applications require such accurate calibration. 


3.3 RGB Color 

Most computer graphics images are defined in terms of red-green-blue (RGB) 
color. RGB color is a simple space that allows straightforward conversion to 
the controls for most computer screens. In this section RGB color is discussed 
from a user’s perspective, and operational facility is the goal. A more thorough 
discussion of color is given in Chapter 21, but the mechanics of RGB color space 
will allow us to write most graphics programs. The basic idea of RGB color space 
is that the color is displayed by mixing three primary lights: one red, one green, 
and one blue. The lights mix in an additive manner. 

In RGB additive color mixing we have (Figure 3.12): 

red + green = yellow 
green + blue = cyan 
blue + red = magenta 
red + green + blue = white. 

The color “cyan” is a blue-green, and the color “magenta” is a purple. 

If we are allowed to dim the primary lights from fully off (indicated by pixel 
value 0) to fully on (indicated by 1), we can create all the colors that can be 



Figure 3.13. The RGB color cube in 3D and its faces unfolded. Any RGB color is a point in 
the cube. (See also Plate I.) 
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displayed on an RGB monitor. The red, green, and blue pixel values create a 
three-dimensional RGB color cube that has a red, a green, and a blue axis. Al¬ 
lowable coordinates for the axes range from zero to one. The color cube is shown 
graphically in Figure 3.13. 

The colors at the corners of the cube are: 

black = (0,0,0) 
red = (1,0,0) 
green = (0,1,0) 
blue = (0,0,1) 
yellow = (1,1, 0) 
magenta = (1,0,1) 
cyan = (0,1,1) 
white = (1,1,1). 

Actual RGB levels are often given in quantized form, just like the grayscales 
discussed in Section 3.2.2. Each component is specified with an integer. The 
most common size for these integers is one byte each, so each of the three RGB 
components is an integer between 0 and 255. The three integers together take 
up three bytes, which is 24 bits. Thus a system that has “24-bit color” has 256 
possible levels for each of the three primary colors. Issues of gamma correction 
discussed in Section 3.2.2 also apply to each RGB component separately. 


3.4 Alpha Compositing 

Often we would like to only partially overwrite the contents of a pixel. A common 
example of this occurs in compositing , where we have a background and want 
to insert a foreground image over it. For opaque pixels in the foreground, we 
just replace the background pixel. For entirely transparent foreground pixels, we 
do not change the background pixel. For partially transparent pixels, some care 
must be taken. Partially transparent pixels can occur when the foreground object 
has partially transparent regions, such as glass, but the most frequent case where 
foreground and background must be blended is when the foreground object only 
partly covers the pixel, either at the edge of the foreground object, or when there 
are sub-pixel holes such as between the leaves of a distant tree. 

The most important piece of information needed to blend a foreground object 
over a background object is the pixel coverage, which tells the fraction of the 
pixel covered by the foreground layer. We can call this fraction a. If we want 
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to composite a foreground color c f over background color c/„ and the fraction of 
the pixel covered by the foreground is a, then we can use the formula 


c = acf + (1 — a)c;,. 


(3.2) 


Since the weights of the 
foreground and background 
layers add up to 1, the 
color won't change if the 
foreground and background 
layers have the same color. 


For an opaque foreground layer, the interpretation is that the foreground object 
covers area a within the pixel’s rectangle and the background object covers the 
remaining area, which is (1 — a). For a transparent layer (think of an image 
painted on glass or on tracing paper, using translucent paint), the interpretation is 
that the foreground layer blocks the fraction (1 — a) of the light coming through 
from the background and contributes a fraction a of its own color to replace what 
was removed. An example of using Equation (3.2) is shown in Figure 3.14. 

The a values for all the pixels in an image might be stored in a separate 
grayscale image, which is then known as an alpha mask or transparency mask. 
Or the information can be stored as a fourth channel in an RGB image, in which 
case it is called the alpha channel , and the image can be called an RGBA image. 
With 8-bit images, each pixel then takes up 32 bits, which is a conveniently sized 
chunk in many computer architectures. 

Although Equation (3.2) is what is usually used, there are a variety of situa¬ 
tions where a is used differently (Porter & Duff, 1984). 



Figure 3.14. An example of compositing using Equation (3.2). The foreground image is 
in effect cropped by the a channel before being put on top of the background image. The 
resulting composite is shown on the bottom. 
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3.4.1 Image Storage 

Most RGB image formats use eight bits for each of the red, green, and blue chan¬ 
nels. This results in approximately three megabytes of raw information for a sin¬ 
gle million-pixel image. To reduce the storage requirement, most image formats 
allow for some kind of compression. At a high level, such compression is ei¬ 
ther lossless or lossy. No information is discarded in lossless compression, while 
some information is lost unrecoverably in a lossy system. Popular image storage 
formats include: 

• jpeg. This lossy format compresses image blocks based on thresholds in 
the human visual system. This format works well for natural images. 

• tiff. This format is most commonly used to hold binary images or losslessly 
compressed 8- or 16-bit RGB although many other options exist. 

• ppm. This very simple lossless, uncompressed format is most often used 
for 8-bit RGB images although many options exist. 

• png. This is a set of lossless formats with a good set of open source man¬ 
agement tools. 

Because of compression and variants, writing input/output routines for images 
can be involved. Fortunately one can usually rely on library routines to read and 
write standard file formats. For quick-and-dirty applications, where simplicity is 
valued above efficiency, a simple choice is to use raw ppm files, which can often 
be written simply by dumping the array that stores the image in memory to a file, 
prepending the appropriate header. 


Frequently Asked Questions 

• Why don’t they just make monitors linear and avoid all this gamma busi¬ 
ness? 

Ideally the 256 possible intensities of a monitor should look evenly spaced as op¬ 
posed to being linearly spaced in energy. Because human perception of intensity is 
itself non-linear, a gamma between 1.5 and 3 (depending on viewing conditions) 
will make the intensities approximately uniform in a subjective sense. In this way 
gamma is a feature. Otherwise the manufacturers would make the monitors linear. 
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Exercises 

1. Simulate an image acquired from the Bayer mosaic by taking a natural im¬ 
age (preferably a scanned photo rather than a digital photo where the Bayer 
mosaic may already have been applied) and creating a grayscale image 
composed of interleaved red/green/blue channels. This simulates the raw 
output of a digital camera. Now create a true RGB image from that output 
and compare with the original. 
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Ray Tracing 


One of the basic tasks of computer graphics is rendering three-dimensional ob¬ 
jects: taking a scene, or model, composed of many geometric objects arranged 
in 3D space and producing a 2D image that shows the objects as viewed from 
a particular viewpoint. It is the same operation that has been done for centuries 
by architects and engineers creating drawings to communicate their designs to 
others. 

Fundamentally, rendering is a process that takes as its input a set of objects and 
produces as its output an array of pixels. One way or another, rendering involves 
considering how each object contributes to each pixel; it can be organized in two 
general ways. In object-order rendering, each object is considered in turn, and 
for each object all the pixels that it influences are found and updated. In image- 
order rendering, each pixel is considered in turn, and for each pixel all the objects 
that influence it are found and the pixel value is computed. You can think of 
the difference in terms of the nesting of loops: in image-order rendering the “for 
each pixel” loop is on the outside, whereas in object-order rendering the “for each 
object” loop is on the outside. 

Image-order and object-order rendering approaches can compute exactly the 
same images, but they lend themselves to computing different kinds of effects 
and have quite different performance characteristics. Well explore the compara¬ 
tive strengths of the approaches in Chapter 8 after we have discussed them both, 
but, broadly speaking, image-order rendering is simpler to get working and more 
flexible in the effects that can be produced, and usually (though not always) takes 
much more execution time to produce a comparable image. 


If the output is a vector 
image rather than a raster 
image, rendering doesn't 
have to involve pixels, but 
we’ll assume raster images 
in this book. 


In a ray tracer it is easy to 
compute accurate shadows 
and reflections, which are 
awkward in the object-order 
framework. 
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Y. 


ray 



Figure 4.1 . The ray is “traced” into the scene and the first object hit is the one seen through 
the pixel. In this case, the triangle T 2 is returned. 


Ray tracing is an image-order algorithm for making renderings of 3D scenes, 
and well consider it first because it’s possible to get a ray tracer working with¬ 
out developing any of the mathematical machinery that’s used for object-order 
rendering. 


4.1 The Basic Ray-Tracing Algorithm 

A ray tracer works by computing one pixel at a time, and for each pixel the basic 
task is to find the object that is seen at that pixel’s position in the image. Each 
pixel “looks” in a different direction, and any object that is seen by a pixel must 
intersect the viewing ray, a line that emanates from the viewpoint in the direction 
that pixel is looking. The particular object we want is the one that intersects 
the viewing ray nearest the camera, since it blocks the view of any other objects 
behind it. Once that object is found, a shading computation uses the intersection 
point, surface normal, and other information (depending on the desired type of 
rendering) to determine the color of the pixel. This is shown in Figure 4.1, where 
the ray intersects two triangles, but only the first triangle hit, T 2 , is shaded. 

A basic ray tracer therefore has three parts: 

1 . ray generation, which computes the origin and direction of each pixel’s 
viewing ray based on the camera geometry; 

2 . ray intersection, which finds the closest object intersecting the viewing ray; 

3. shading, which computes the pixel color based on the results of ray inter¬ 
section. 





4.2. Perspective 
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The structure of the basic ray tracing program is: 
for each pixel do 
compute viewing ray 

find first object hit by ray and its surface normal n 
set pixel color to value computed from hit point, light, and n 
This chapter covers basic methods for ray generation, ray intersection, and shad¬ 
ing, that are sufficient for implementing a simple demonstration ray tracer. For a 
really useful system, more efficient ray intersection techniques from Chapter 12 
need to be added, and the real potential of a ray tracer will be seen with the more 
advanced shading methods from Chapter 10 and the additional rendering tech¬ 
niques from Chapter 13. 


4.2 Perspective 

The problem of representing a 3D object or scene with a 2D drawing or paint¬ 
ing was studied by artists hundreds of years before computers. Photographs also 
represent 3D scenes with 2D images. While there are many unconventional ways 
to make images, from cubist painting to fish-eye lenses (Figure 4.2) to peripheral 
cameras, the standard approach for both art and photography, as well as computer 
graphics, is linear perspective, in which 3D objects are projected onto an image 
plane in such a way that straight lines in the scene become straight lines in the 
image. 

The simplest type of projection is parallel projection, in which 3D points are 
mapped to 2D by moving them along a projection direction until they hit the 
image plane (Figures 4.3^4.4). The view that is produced is determined by the 
choice of projection direction and image plane. If the image plane is perpendicular 



Figure 4.2. An image 
taken with a fisheye lens is 
not a linear perspective im¬ 
age. Photo courtesy Philip 
Greenspan. 


axis-aligned 

orthographic 



Figure 4.3. When projection lines are parallel and perpendicular to the image plane, the 
resulting views are called orthographic. 
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Figure 4.4. A parallel projection that has the image plane at an angle to the projection di¬ 
rection is called oblique (right). In perspective projection, the projection lines all pass through 
the viewpoint, rather than being parallel (left). The illustrated perspective view is non-oblique 
because a projection line drawn through the center of the image would be perpendicular to 
the image plane. 


Some books reserve “or¬ 
thographic” for projection 
directions that are parallel 
to the coordinate axes. 


to the view direction, the projection is called orthographic, otherwise it is called 
oblique. 

Parallel projections are often used for mechanical and architectural drawings 
because they keep parallel lines parallel and they preserve the size and shape of 
planar objects that are parallel to the image plane. 

The advantages of parallel projection are also its limitations. In our everyday 
experience (and even more so in photographs) objects look smaller as they get 
farther away, and as a result parallel lines receding into the distance do not ap¬ 
pear parallel. This is because eyes and cameras don’t collect light from a single 
viewing direction; they collect light that passes through a particular viewpoint. 
As has been recognized by artists since the Renaissance, we can produce natural- 



Figure 4.5. In three-point perspective, an artist picks “vanishing points” where parallel 
lines meet. Parallel horizontal lines will meet at a point on the horizon. Every set of parallel 
lines has its own vanishing points. These rules are followed automatically if we implement 
perspective based on the correct geometric principles. 
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looking views using perspective projection: we simply project along lines that 
pass through a single point, the viewpoint, rather than along parallel lines (Fig¬ 
ure 4.4). In this way objects farther from the viewpoint naturally become smaller 
when they are projected. A perspective view is determined by the choice of view¬ 
point (rather than projection direction) and image plane. As with parallel views 
there are oblique and non-oblique perspective views; the distinction is made based 
on the projection direction at the center of the image. 

You may have learned about the artistic conventions of three-point perspec¬ 
tive, a system for manually constructing perspective views (Figure 4.5). A sur¬ 
prising fact about perspective is that all the rules of perspective drawing will be 
followed automatically if we follow the simple mathematical rule underlying per¬ 
spective: objects are projected directly toward the eye, and they are drawn where 
they meet a view plane in front of the eye. 


4.3 Computing Viewing Rays 


From the previous section, the basic tools of ray generation are the viewpoint (or 
view direction, for parallel views) and the image plane. There are many ways to 
work out the details of camera geometry; in this section we explain one based 
on orthonormal bases that supports normal and oblique parallel and orthographic 
views. 

In order to generate rays, we first need a mathematical representation for a ray. 
A ray is really just an origin point and a propagation direction; a 3D parametric 
line is ideal for this. As discussed in Section 2.5.7, the 3D parametric line from 
the eye e to a point s on the image plane (Figure 4.6) is given by 

p(t) = e + <(s - e). 

This should be interpreted as, “we advance from e along the vector (s — e) a 
fractional distance t to find the point p.” So given t, we can determine a point p. 
The point e is the ray’s origin, and s — e is the ray’s direction. 

Note that p(0) = e, and p( 1) = s, and more generally, if 0 < t\ < t 2 , then 
p(fi) is closer to the eye than p(f 2 )- Also, if £ < 0, then p(f) is “behind” the eye. 
These facts will be useful when we search for the closest object hit by the ray that 
is not behind the eye. 

To compute a viewing ray, we need to know e (which is given) and s. Finding 
s may seem difficult, but it is actually straightforward if we look at the problem 
in the right coordinate system. 



Figure 4.6. The ray from 
the eye to a point on the im¬ 
age plane. 


Caution: we are overload¬ 
ing the variable t, which is 
the ray parameter and also 
the v-coordinate of the top 
edge of the image. 
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Figure 4.8. The vectors of 
the camera frame, together 
with the view direction and 
up direction. The w vec¬ 
tor is opposite the view di¬ 
rection, and the v vector is 
coplanar with w and the up 
vector. 


Since v and w have to be 
perpendicular, the up vec¬ 
tor and v are not generally 
the same. But setting the 
up vector to point straight 
upward in the scene will ori¬ 
ent the camera in the way 
we would think of as “up¬ 
right.” 



Figure 4.7. The sample points on the screen are mapped to a similar array on the 3D 
window. A viewing ray is sent to each of these locations. 


All of our ray-generation methods start from an orthonormal coordinate frame 
known as the camera frame, which we’ll denote by e, for the eye point, or view¬ 
point, and u, v, and w for the three basis vectors, organized with u pointing right- 
ward (from the camera’s view), v pointing upward, and w pointing backward, so 
that {u, v, w} forms a right-handed coordinate system. The most common way 
to construct the camera frame is from the viewpoint, which becomes e, the view 
direction, which is — w, and the up vector, which is used to construct a basis that 
has v and w in the plane defined by the view direction and the up direction, using 
the process for constructing an orthonormal basis from two vectors described in 
Section 2.4.7. 


4.3.1 Orthographic Views 


It might seem logical that 
orthographic viewing rays 
should start from infinitely 
far away, but then it would 
not be possible to make or¬ 
thographic views of an ob¬ 
ject inside a room, for in¬ 
stance. 


Many systems assume that 
/ = - r and b = - t so that a 
width and a height suffice. 


For an orthographic view, all the rays will have the direction — w. Even though 
a parallel view doesn’t have a viewpoint per se, we can still use the origin of the 
camera frame to define the plane where the rays start, so that it’s possible for 
objects to be behind the camera. 

The viewing rays should start on the plane defined by the point e and the 
vectors u and v; the only remaining information required is where on the plane the 
image is supposed to be. We'll define the image dimensions with four numbers, 
for the four sides of the image: l and r are the positions of the left and right 
edges of the image, as measured from e along the u direction; and b and t are the 
positions of the bottom and top edges of the image, as measured from e along the 
v direction. Usually l < 0 < r and b < 0 < t. (See Figure 4.9.) 

In Section 3.2 we discussed pixel coordinates in an image. To fit an image 
with n x x n y pixels into a rectangle of size (r — l) x (t — b), the pixels are 
spaced a distance (r — l) jn x apart horizontally and (t — b)/n y apart vertically, 
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Parallel projection Perspective projection 

same direction, different origins same origin, different directions 

Figure 4.9. Ray generation using the camera frame. Left: In an orthographic view, the rays 
start at the pixels’ locations on the image plane, and all share the same direction, which is 
equal to the view direction. Right: In a perspective view, the rays start at the viewpoint, and 
each ray’s direction is defined by the line through the viewpoint, e, and the pixel’s location on 
the image plane. 


with a half-pixel space around the edge to center the pixel grid within the image 
rectangle. This means that the pixel at position (i,j) in the raster image has the 
position 

u = l + (r — l)(i + 0.5 )/n x , 

(4.1) 

v = b+ (t - b)(j + 0.5)/%, 

where (u, v ) are the coordinates of the pixel's position on the image plane, mea¬ 
sured with respect to the origin e and the basis {u, v}. 

In an orthographic view, we can simply use the pixel’s image-plane position 
as the ray’s starting point, and we already know the ray’s direction is the view 
direction. The procedure for generating orthographic viewing rays is then: 
compute u and v using (4.1) 

ray .direction <-w 

ray .origin <— e + iiu+«v 

It’s very simple to make an oblique parallel view: just allow the image plane 
normal w to be specified separately from the view direction d. The procedure is 
then exactly the same, but with d substituted for —w. Of course w is still used to 
construct u and v. 


With / and rboth specified, 
there is redundancy: mov¬ 
ing the viewpoint a bit to 
the right and correspond¬ 
ingly decreasing / and r will 
not change the view (and 
similarly on the v-axis). 


4.3.2 Perspective Views 

For a perspective view, all the rays have the same origin, at the viewpoint; it 
is the directions that are different for each pixel. The image plane is no longer 
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positioned at e, but rather some distance d in front of e; this distance is the image 
plane distance, often loosely called the focal length, because choosing d plays the 
same role as choosing focal length in a real camera. The direction of each ray is 
defined by the viewpoint and the position of the pixel on the image plane. This 
situation is illustrated in Figure 4.9, and the resulting procedure is similar to the 
orthographic one: 

compute u and v using (4.1) 

ray .direction <- d w + u u + v v 

ray .origin <— e 

As with parallel projection, oblique perspective views can be achieved by spec¬ 
ifying the image plane normal separately from the projection direction, then re¬ 
placing — d w with dd in the expression for the ray direction. 


4.4 Ray-Object Intersection 

Once we've generated a ray e + id, we next need to find the first intersection with 
any object where t > 0. In practice it turns out to be useful to solve a slightly 
more general problem: find the first intersection between the ray and a surface that 
occurs at a t in the interval [to, t\\. The basic ray intersection is the case where 
to = 0 and ti = +oo. We solve this problem for both spheres and triangles. In 
the next section, multiple objects are discussed. 


4.4.1 Ray-Sphere Intersection 

Given a ray p(t) = e + td and an implicit surface /(p) = 0 (see Section 2.5.3), 
we’d like to know where they intersect. Intersection points occur when points on 
the ray satisfy the implicit equation, so the values of t we seek are those that solve 
the equation 

/(P(f)) = 0 or /(e + Id) = 0. 

A sphere with center c = (x c ,y c ,z c ) and radius R can be represented by the 
implicit equation 


(x — x c ) 2 + (y — y c ) 2 + (z — z c ) 2 — R 2 — 0. 

We can write this same equation in vector form: 


(P - c) • (p - c) - R 2 = 0. 


4.4. Ray-Object Intersection 
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Any point p that satisfies this equation is on the sphere. If we plug points on the 
ray p(f) = e + fd into this equation, we get an equation in terms of f that is 
satisfied by the values of f that yield points on the sphere: 

(e + fd — c) • (e + fd — c) — R 2 = 0. 

Rearranging terms yields 

(d ■ d)f 2 + 2d ■ (e — c)f + (e — c) • (e — c) — R 2 = 0. 

Here, everything is known except the parameter f, so this is a classic quadratic 
equation in f, meaning it has the form 

At 2 + Bt + C = 0. 

The solution to this equation is discussed in Section 2.2. The term under the 
square root sign in the quadratic solution, B 2 — 4 AC, is called the discriminant 
and tells us how many real solutions there are. If the discriminant is negative, 
its square root is imaginary and the line and sphere do not intersect. If the dis¬ 
criminant is positive, there are two solutions: one solution where the ray enters 
the sphere and one where it leaves. If the discriminant is zero, the ray grazes 
the sphere, touching it at exactly one point. Plugging in the actual terms for the 
sphere and canceling a factor of two, we get 

—d • (e — c) ± \/(d ■ (e — c)) 2 — (d ■ d) ((e — c) • (e — c) — R?) 

f =--- 

(d-d) 

In an actual implementation, you should first check the value of the discriminant 
before computing other terms. If the sphere is used only as a bounding object for 
more complex objects, then we need only determine whether we hit it; checking 
the discriminant suffices. 

As discussed in Section 2.5.4, the normal vector at point p is given by the 
gradient n = 2(p — c). The unit normal is (p — c)/R. 


4.4.2 Ray-Triangle Intersection 

There are many algorithms for computing ray-triangle intersections. We will 
present the form that uses barycentric coordinates for the parametric plane con¬ 
taining the triangle, because it requires no long-term storage other than the ver¬ 
tices of the triangle (Snyder & Barr, 1987). 
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Figure 4.10. The ray hits 
the plane containing the tri¬ 
angle at point p. 


To intersect a ray with a parametric surface, we set up a system of equations 
where the Cartesian coordinates all match: 


x e +tx d = f(u,v)' 
Ve + tyd = g(u, v) > 
z e + tz d = h(u, V) j 


or, e + id = f(zt, v). 


Here, we have three equations and three unknowns (f, u, and v), so we can solve 
numerically for the unknowns. If we are lucky, we can solve for them analytically. 

In the case where the parametric surface is a parametric plane, the parametric 
equation can be written in vector form as discussed in Section 2.7.2. If the vertices 
of the triangle are a, b, and c, then the intersection will occur when 


e +fd = a+/3(b — a) + 7 (c — a), (4.2) 


for some t, (3, and 7. The intersection p will be at e + td as shown in Figure 4.10. 
Again, from Section 2.7.2, we know the intersection is inside the triangle if and 
only if (3 > 0, 7 > 0, and (3 + 7 < 1. Otherwise, the ray has hit the plane outside 
the triangle, so it misses the triangle. If there are no solutions, either the triangle 
is degenerate or the ray is parallel to the plane containing the triangle. 

To solve for t, (3, and 7 in Equation (4.2), we expand it from its vector form 
into the three equations for the three coordinates: 

X e + tX d = X a + P(x b - X a ) + 7(tC c - X a ), 

Ve + ty d = y a + (3{yb ~ y a ) + 7 (Vc - Va), 

Z e + tz d = Za + (3{z b - Za) + 7 (z c - Z a ). 

This can be rewritten as a standard linear system: 


Xa X b X a X c Xd 


7 ?' 


X a - X e 

Va ~Vb Va - Vc Vd 


7 

= 

Va - Ve 

_Z a - Z h Z a - Z c Z d _ 


t 


_Z a - Z e _ 


The fastest classic method to solve this 3x3 linear system is Cramer’s rule. This 
gives us the solutions 


X a 

- x e 

X a 

- x c 

Xd 

lla 

- Ve 

Va 

- Vc 

Vd 

Za 

- Z e 

Za 

- Z c 

Zd 


|A| 


X a 

- x b 

Xa 

- x e 

Xd 

Va 

~ Ub 

Va 

- Ve 

Vd 

Za 

- z b 

Za 

- z e 

Zd 


|A| 
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t = 


X a X}) x a x c x a X G 

Va ~ Vb Va Vc Va ~ Ve 

Z a ~Z b Z a 2c Z a - Z e 


where the matrix A is 



X a 

- x b 

X a 

- x c 

Xd 

A = 

Va 

- Vb 

Va 

~ Vc 

yd 


Za 

- z b 

Za 

- Z c 

Zd_ 


and | A| denotes the determinant of A. The 3x3 determinants have common sub¬ 
terms that can be exploited. Looking at the linear systems with dummy variables 


a d g 


7? 


3 

b e h 


7 

= 

k 

c f i 


t 


l 


Cramer's rule gives us 


P = 


j{ei - hf) + k(gf - di) + l(dh - eg) 
M 


7 = 


i(ak — jb) + h(jc — al) + g(bl — kc) 
M ’ 


f(ak — jb) + e(jc — al) + d(bl — kc) 
t = ~ : " M 1 

where 

M = a(ei — hf) + b(gf — di) + c{dh — eg). 

We can reduce the number of operations by reusing numbers such as 
“ ei-minus-hfr 

The algorithm for the ray-triangle intersection for which we need the linear so¬ 
lution can have some conditions for early termination. Thus, the function should 
look something like: 

boolean raytri (ray r, vector3 a, vector3 b, vector3 c, interval [to, ti]) 
compute t 

if (t < to) or (t > t\) then 
return false 
compute 7 

if (7 < 0 ) or (7 > 1) then 
return false 
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compute (3 

if (/? < 0 ) or (/? > 1 — 7) then 

return false 
return true 


4.4.3 Ray-Polygon Intersection 

Given a planar polygon with m vertices p, through p m and surface normal n, 
we first compute the intersection points between the ray e + td and the plane 
containing the polygon with implicit equation 

(P - Pi) • n = 0. 

We do this by setting p = c + /d and solving for t to get 

t = (Pi - e) ■ 11 

d • n 

This allows us to compute p. If p is inside the polygon, then the ray hits it, and 
otherwise it does not. 

We can answer the question of whether p is inside the polygon by projecting 
the point and polygon vertices to the xy plane and answering it there. The easiest 
way to do this is to send any 2D ray out from p and to count the number of 
intersections between that ray and the boundary of the polygon (Sutherland et al., 
1974; Glassner, 1989). If the number of intersections is odd, then the point is 
inside the polygon; otherwise it is not. This is true because a ray that goes in 
must go out, thus creating a pair of intersections. Only a ray that starts inside 
will not create such a pair. To make computation simple, the 2D ray may as well 
propagate along the x-axis: 


X 


x n 


Y 

= 

p 

+ s 

0 

y. 


Vv. 



It is straightforward to compute the intersection of that ray with the edges such as 

{x ll y 1 ,x 2 ,V 2 ) fors G (0,oo). 

A problem arises, however, for polygons whose projection into the xy plane 
is a line. To get around this, we can choose among the xy , yz, or zx planes for 
whichever is best. If we implement our points to allow an indexing operation, 
e.g., p( 0 ) = x p then this can be accomplished as follows: 

if (abs( 2 : n ) > abs(x„)) and (abs(^„) > abs(y n )) then 

indexO = 0 
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index 1 = 1 

else if (abs(t/ n ) > abs (x n )) then 

indexO = 0 
index1 = 2 
else 

indexO = 1 
index 1 = 2 

Now, all computations can use p(indexO) rather than x p , and so on. 

Another approach to polygons, one that is often used in practice, is to replace 
them by several triangles. 


4.4.4 Intersecting a Group of Objects 

Of course, most interesting scenes consist of more than one object, and when we 
intersect a ray with the scene we must find only the closest intersection to the 
camera along the ray. A simple way to implement this is to think of a group of 
objects as itself being another type of object. To intersect a ray with a group, you 
simply intersect the ray with the objects in the group and return the intersection 
with the smallest t, value. The following code tests for hits in the interval t £ 
[to,ti\ : 
hit = false 

for each object o in the group do 

if (o is hit at ray parameter t and t £ [to, ti]) then 
hit = true 
hitobject = o 
t\ — t 
return hit 



Figure 4.11. A simple 
scene rendered with only 
ray generation and surface 
intersection, but no shad¬ 
ing; each pixel is just set to 
a fixed color depending on 
which object it hit. 


4.5 Shading 

Once the visible surface for a pixel is known, the pixel value is computed by eval¬ 
uating a shading model. How this is done depends entirely on the application- 
methods range from very simple heuristics to elaborate numerical computations. 
In this chapter we describe the two most basic shading models; more advanced 
models are discussed in Chapter 10. 

Most shading models, one way or another, are designed to capture the process 
of light reflection, whereby surfaces are illuminated by light sources and reflect 
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Illumination from real point 
sources falls off as distance 
squared, but that is often 
more trouble than it’s worth 
in a simple Tenderer. 



Figure 4.1 2. Geometry for 
Lambertian shading. 


When in doubt, make light 
sources neutral in color, 
with equal red, green, and 
blue intensities. 


part of the light to the camera. Simple shading models are defined in terms of 
illumination from a point light source. The important variables in light reflection 
are the light direction 1 , which is a unit vector pointing towards the light source; 
the view direction v, which is a unit vector pointing toward the eye or camera; the 
surface normal n, which is a unit vector perpendicular to the surface at the point 
where reflection is taking place; and the characteristics of the surface—color, 
shininess, or other properties depending on the particular model. 


4.5.1 Lambertian Shading 

The simplest shading model is based on an observation made by Lambert in the 
18th century: the amount of energy from a light source that falls on an area of 
surface depends on the angle of the surface to the light. A surface facing directly 
towards the light receives maximum illumination; a surface tangent to the light 
direction (or facing away from the light) receives no illumination; and in between 
the illumination is proportional to the cosine of the angle 9 between the surface 
normal and the light source (Figure 4.12). This leads to the Lambertian shading 
model: 

L = kdl max(0, n • 1 ) 

where L is the pixel color; k,i is the diffuse coefficient, or the surface color; and 
I is the intensity of the light source. Because n and 1 are unit vectors, we can 
use n • 1 as a convenient shorthand (both on paper and in code) for cos 9. This 
equation (as with the other shading equations in this section) applies separately to 
the three color channels, so the red component of the pixel value is the product of 
the red diffuse component, the red light source intensity, and the dot product; the 
same holds for green and blue. 

The vector 1 is computed by subtracting the intersection point of the ray and 
surface from the light source position. Don't forget that v, 1, and n all must be 
unit vectors; failing to normalize these vectors is a very common error in shading 
computations. 


4.5.2 Blinn-Phong Shading 

Lambertian shading is view independent: the color of a surface does not depend 
on the direction from which you look. Many real surfaces show some degree 
of shininess, producing highlights, or specular reflections, that appear to move 
around as the viewpoint changes. Lambertian shading doesn’t produce any high¬ 
lights and leads to a very matte, chalky appearance, and many shading models 



4.5. Shading 


83 




Figure 4.13. A simple 
scene rendered with diffuse 
shading from a single light 
source. 


Figure 4.14. A simple 
scene rendered with diffuse 
shading and shadows (Sec¬ 
tion 4.7) from three light 
sources. 


Figure 4.15. A sim¬ 
ple scene rendered with dif¬ 
fuse shading (right), Blinn- 
Phong shading (left), and 
shadows (Section 4.7) from 
three light sources. 


add a specular component to Lambertian shading; the Lambertian part is then the 
diffuse component. 

A very simple and widely used model for specular highlights was proposed 
by Phong (Phong, 1975) and later updated by Blinn (J. F. Blinn, 1976) to the form 
most commonly used today. The idea is to produce reflection that is at its brightest 
when v and 1 are symmetrically positioned across the surface normal, which is 
when mirror reflection would occur; the refelction then decreases smoothly as the 
vectors move away from a mirror configuration. 

We can tell how close we are to a mirror configuration by comparing the 
half vector h (the bisector of the angle between v and 1) to the surface normal 
(Figure 4.16). If the half vector is near the surface normal, the specular component 
should be bright; if it is far away it should be dim. This result is achieved by 
computing the dot product between h and n (remember they are unit vectors, so 
n • h reaches its maximum of 1 when the vectors are equal), then taking the result 
to a power p > 1 to make it decrease faster. The power, or Phong exponent, 
controls the apparent shininess of the surface. The half vector itself is easy to 
compute: since v and 1 are the same length, their sum is a vector that bisects the 
angle between them, which only needs to be normalized to produce h. 

Putting this all together, the Blinn-Phong shading model is as follows: 


L = kdl max(0, n • 1) + k s I max(0, n • h) p , 



Figure 4.16. Geometry for 
Blinn-Phong shading. 


Typical values of p: 10— 
“eggshell”; 100—mildly 
shiny; 1000—really glossy; 
10,000—nearly mirror-like. 


When in doubt, make the 
specular color gray, with 
equal red, green, and blue 
values. 


where k s is the specular coefficient, or the specular color, of the surface. 
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4.5.3 Ambient Shading 


In the real world, surfaces 
that are not illuminated by 
light sources are illumi¬ 
nated by indirect reflections 
from other surfaces. 


Surfaces that receive no illumination at all will be rendered as completely black, 
which is often not desirable. A crude but useful heuristic to avoid black shadows 
is to add a constant component to the shading model, one whose contribution 
to the pixel color depends only on the object hit, with no dependence on the 
surface geometry at all. This is known as ambient shading—it is as if surfaces 
were illuminated by “ambient” light that comes equally from everywhere. For 
convenience in tuning the parameters, ambient shading is usually expressed as 
the product of a surface color with an ambient light color, so that ambient shading 
can be tuned for surfaces individually or for all surfaces together. Together with 
the rest of the Blinn-Phong model, ambient shading completes the full version of 
a simple and useful shading model: 


L = k a I a + kd,I max(0, n • 1) + k s I max(0, n • h) ra , (4.3) 


When in doubt set the am¬ 
bient color to be the same 
as the diffuse color. 


where k a is the surface’s ambient coefficient, or “ambient color,” and I a is the 
ambient light intensity. 


4.5.4 Multiple Point Lights 

A very useful property of light is superposition —the effect caused by more than 
one light source is simply the sum of the effects of the light sources individually. 
For this reason, our simple shading model can easily be extended to handle N 
light sources: 

N 

L= kaIa + ^2 [kd Ii max(0, n ■ 1 ,) + k s h max(0, n ■ hi) p ], (4.4) 

i=1 

where Ii, 1, , and h, are the intensity, direction, and half vector of the i th light 
source. 


4.6 A Ray-Tracing Program 

We now know how to generate a viewing ray for a given pixel, how to find the 
closest intersection with an object, and how to shade the resulting intersection. 
These are all the parts required for a program that produces shaded images with 
hidden surfaces removed. 


4.6. A Ray-Tracing Program 
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for each pixel do 
compute viewing ray 

if (ray hits an object with t £ [0, oo)) then 

Compute n 

Evaluate shading model and set pixel to that color 
else 

set pixel color to background color 

Here the statement “if ray hits an object...” can be implemented using the algo¬ 
rithm of Section 4.4.4. 

In an actual implementation, the surface intersection routine needs to some¬ 
how return either a reference to the object that is hit, or at least its normal vec¬ 
tor and shading-relevant material properties. This is often done by passing a 
record/structure with such information. In an object-oriented implementation, it 
is a good idea to have a class called something like surface with derived classes 
triangle, sphere, group , etc. Anything that a ray can intersect would be under that 
class. The ray-tracing program would then have one reference to a “surface” for 
the whole model, and new types of objects and efficiency structures can be added 
transparently. 

4.6.1 Object-Oriented Design for a Ray-Tracing Program 

As mentioned earlier, the key class hierarchy in a ray tracer are the geometric 
objects that make up the model. These should be subclasses of some geometric 
object class, and they should support a hit function (Kirk & Arvo, 1988). To 
avoid confusion from use of the word “object,” surface is the class name often 
used. With such a class, you can create a ray tracer that has a general interface 
that assumes little about modeling primitives and debug it using only spheres. An 
important point is that anything that can be “hit” by a ray should be part of this 
class hierarchy, e.g., even a collection of surfaces should be considered a subclass 
of the surface class. This includes efficiency structures, such as bounding volume 
hierarchies; they can be hit by a ray, so they are in the class. 

For example, the “abstract” or “base” class would specify the hit function as 
well as a bounding box function that will prove useful later: 

class surface 

virtual bool hit(ray e + fd, real to , real 1 1 , hit-record rec) 
virtual box bounding-box() 

Here (fo,ti) is the interval on the ray where hits will be returned, and rec is a 
record that is passed by reference; it contains data such as the t at the intersection 
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Figure 4.17. The point p 
is not in shadow while the 
point q is in shadow. 


when hit returns true. The type box is a 3D “bounding box,” that is two points that 
define an axis-aligned box that encloses the surface. For example, for a sphere, 
the function would be implemented by 

box sphere::bounding-box() 

vector3 min = center - vector3(radius,radius,radius) 
vector3 max = center + vector3 (radius, radius Radius) 
return box(min, max) 

Another class that is useful is material. This allows you to abstract the material 
behavior and later add materials transparently. A simple way to link objects and 
materials is to add a pointer to a material in the surface class, although more 
programmable behavior might be desirable. A big question is what to do with 
textures; are they part of the material class or do they live outside of the material 
class? This will be discussed more in Chapter 11. 


4.7 Shadows 


Once you have a basic ray tracing program, shadows can be added very easily. 
Recall from Section 4.5 that light comes from some direction 1. If we imagine 
ourselves at a point p on a surface being shaded, the point is in shadow if we 
“look” in direction 1 and see an object. If there are no objects, then the light is not 
blocked. 

This is shown in Figure 4.17, where the ray p + tl does not hit any objects 
and is thus not in shadow. The point q is in shadow because the ray q + tl 
does hit an object. The vector 1 is the same for both points because the light 
is “far” away. This assumption will later be relaxed. The rays that determine 
in or out of shadow are called shadow rays to distinguish them from viewing 
rays. 

To get the algorithm for shading, we add an if statement to determine whether 
the point is in shadow. In a naive implementation, the shadow ray will check 
for t £ [0, oo), but because of numerical imprecision, this can result in an inter¬ 
section with the surface on which p lies. Instead, the usual adjustment to avoid 
that problem is to test for t, £ [e, oo) where e is some small positive constant 
(Figure 4.18). 

If we implement shadow rays for Phong lighting with Equation 4.3 then we 
have the following: 




4.8. Ideal Specular Reflection 
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function raycolor( ray e + /cl, real to, real t\ ) 
hit-record rec, srec 

if (scenes hit (e + Id, to, fi,rec)) then 
p = e + (rec i) d 
color c = rec ,k a I a 

if (not scene—>hit(p + si, e, oo, srec)) then 

vector3 h = normalized(normalized(l) + normalized(—d)) 
c = c + rec .kd I max (0, rec.n • 1) + (rec.fc s ) I (rec.n • h) recp 

return c 
else 

return background-color 

Note that the ambient color is added whether p is in shadow or not. If there are 
multiple light sources, we can send a shadow ray before evaluating the shading 
model for each light. The code above assumes that d and 1 are not necessarily unit 
vectors. This is crucial for d, in particular, if we wish to cleanly add instancing 
later (see Section 13.2). 


4.8 Ideal Specular Reflection 

It is straightforward to add ideal specular reflection, or mirror reflection, to a ray¬ 
tracing program. The key observation is shown in Figure 4.19 where a viewer 
looking from direction e sees what is in direction r as seen from the surface. The 
vector r is found using a variant of the Phong lighting reflection Equation (10.6). 
There are sign changes because the vector d points toward the surface in this case, 
so, 

r = d — 2(d ■ n)n, (4.5) 

In the real world, some energy is lost when the light reflects from the surface, and 
this loss can be different for different colors. For example, gold reflects yellow 
more efficiently than blue, so it shifts the colors of the objects it reflects. This can 
be implemented by adding a recursive call in raycolor. 

color c = c + fc m raycolor(p + sr, e, oo) 

where k m (for “mirror reflection”) is the specular RGB color. We need to make 
sure we test for s £ [e, oo) for the same reason as we did with shadow rays; we 
don’t want the reflection ray to hit the object that generates it. 

The problem with the recursive call above is that it may never terminate. For 
example, if a ray starts inside a room, it will bounce forever. This can be fixed by 



Figure 4.18. By testing 
in the interval starting at e, 
we avoid numerical impre¬ 
cision causing the ray to hit 
the surface p is on. 



Figure 4.19. When look¬ 
ing into a perfect mirror, the 
viewer looking in direction d 
will see whatever the viewer 
“below” the surface would 
see in direction r. 



Figure 4.20. A simple 
scene rendered with diffuse 
and Blinn-Phong shading, 
shadows from three light 
sources, and specular re¬ 
flection from the floor. 
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adding a maximum recursion depth. The code will be more efficient if a reflection 
ray is generated only if k m is not zero (black). 


4.9 Historical Notes 

Ray tracing was developed early in the history of computer graphics (Appel, 
1968) but was not used much until a while later when sufficient compute power 
was available (Kay & Greenberg, 1979; Whitted, 1980). 

Ray tracing has a lower asymptotic time complexity than basic object-order 
rendering (Snyder & Barr, 1987; Muuss, 1995; Parker, Martin, et al., 1999; Wald 
et al., 2001). Although it was traditionally thought of as an offline method, real¬ 
time ray tracing implementations are becoming more and more common. 


Frequently Asked Questions 

• Why is there no perspective matrix in ray tracing? 

The perspective matrix in a z-buffer exists so that we can turn the perspective pro¬ 
jection into a parallel projection. This is not needed in ray tracing, because it is 
easy to do the perspective projection implicitly by fanning the rays out from the 
eye. 

• Can ray tracing be made interactive? 

For sufficiently small models and images, any modern PC is sufficiently pow¬ 
erful for ray tracing to be interactive. In practice, multiple CPUs with a shared 
frame buffer are required for a full-screen implementation. Computer power is in¬ 
creasing much faster than screen resolution, and it is just a matter of time before 
conventional PCs can ray trace complex scenes at screen resolution. 

• Is ray tracing useful in a hardware graphics program? 

Ray tracing is frequently used for picking. When the user clicks the mouse on a 
pixel in a 3D graphics program, the program needs to determine which object is 
visible within that pixel. Ray tracing is an ideal way to determine that. 


4.9. Historical Notes 
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Exercises 

1. What are the ray parameters of the intersection points between ray (1,1,1)+ 
t(—1, —1, —1) and the sphere centered at the origin with radius 1? Note: 
this is a good debugging case. 

2. What are the barycentric coordinates and ray parameter where the ray 
(1,1,1) + t(— 1, —1, —1) hits the triangle with vertices (1,0, 0), (0,1, 0), 
and (0,0,1)? Note: this is a good debugging case. 

3. Do a back of the envelope computation of the approximate time complexity 
of ray tracing on “nice” (non-adversarial) models. Split your analysis into 
the cases of preprocessing and computing the image, so that you can predict 
the behavior of ray tracing multiple frames for a static model. 
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Linear Algebra 


Perhaps the most universal tools of graphics programs are the matrices that 
change or transform points and vectors. In the next chapter, we will see 
how a vector can be represented as a matrix with a single column, and how 
the vector can be represented in a different basis via multiplication with a 
square matrix. We will also describe how we can use such multiplications to 
accomplish changes in the vector such as scaling, rotation, and translation. In this 
chapter, we review basic linear algebra from a geometric perspective, focusing on 
intuition and algorithms that work well in the two- and three-dimensional case. 

This chapter can be skipped by readers comfortable with linear algebra. How¬ 
ever, there may be some enlightening tidbits even for such readers, such as the 
development of determinants and the discussion of singular and eigenvalue de¬ 
composition. 

5.1 Determinants 

We usually think of determinants as arising in the solution of linear equations. 
However, for our purposes, we will think of determinants as another way to mul¬ 
tiply vectors. For 2D vectors a and b, the determinant |abj is the area of the 
parallelogram formed by a and b (Figure 5.1). This is a signed area, and the 
sign is positive if a and b are right-handed and negative if they are left-handed. 
This means |ab| = — |baj . In 2D we can interpret “right-handed” as meaning we 
rotate the first vector counterclockwise to close the smallest angle to the second 
vector. In 3D the determinant must be taken with three vectors at a time. For 
three 3D vectors, a, b, and c, the determinant |abc| is the signed volume of the 



Figure 5.1. The signed 
area of the parallelogram is 
|ab|, and in this case the 
area is positive. 



Figure 5.2. The 

signed volume of the paral¬ 
lelepiped shown is denoted 
by the determinant |abc|, 
and in this case the volume 
is positive because the vec¬ 
tors form a right-handed ba¬ 
sis. 
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Figure 5.3. Scaling a par¬ 
allelogram along one direc¬ 
tion changes the area in the 
same proportion. 



Figure 5.4. Shearing 
a parallelogram does not 
change its area. These 
four parallelograms have 
the same length base and 
thus the same area. 



Figure 5.5. The geometry 
behind Equation 5.1. Both 
of the parallelograms on the 
left can be sheared to cover 
the single parallelogram on 
the right. 


parallelepiped (3D parallelogram; a sheared 3D box) formed by the three vectors 
(Figure 5.2). To compute a 2D determinant, we first need to establish a few of its 
properties. We note that scaling one side of a parallelogram scales its area by the 
same fraction (Figure 5.3): 

| (fca)b| = |a(fcb)| = fc|ab|. 

Also, we note that “shearing” a parallelogram does not change its area (Fig¬ 
ure 5.4): 

| (a + fcb)b| = |a(b + fca)| = |ab|. 

Finally, we see that the determinant has the following property: 

|a(b + c)| = |ab| + |ac|, (5.1) 

because as shown in Figure 5.5 we can “slide” the edge between the two parallel¬ 
ograms over to form a single parallelogram without changing the area of either of 
the two original parallelograms. 

Now let’s assume a Cartesian representation for a and b: 

|ab| = \(x a x + y a y)(x b x + y b y)\ 

= x a x b \xx.\ + x a y b \xy\ + y a x b |yx| + y a y b |yy| 

= x a x b (0) + x a y b (+ 1) + y a x b (- 1) + y a y b ( 0) 

= x a y b - y a x b . 

This simplification uses the fact that |vv| = 0 for any vector v, because the 
parallelograms would all be collinear with v and thus without area. 

In three dimensions, the determinant of three 3D vectors a, b, and c is denoted 
|abc|. With Cartesian representations for the vectors, there are analogous rules 
for parallelepipeds as there are for parallelograms, and we can do an analogous 
expansion as we did for 2D: 

|abc| = \(x a x + y a y + z a z) (x b x + y b y + z b z) (x c x + y c y + z c z) \ 

= x a y b z c - x a z b y c - y a x b z c + y a z b x c + z a x b y c - z a y b x c . 

As you can see, the computation of determinants in this fashion gets uglier as the 
dimension increases. We will discuss less error-prone ways to compute determi¬ 
nants in Section 5.3. 

Example. Determinants arise naturally when computing the expression for one 
vector as a linear combination of two others —for example, if we wish to express 
a vector c as a combination of vectors a and b: 


c = a c a + b c b. 
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Figure 5.6. On the left, the vector c can be represented using two basis vectors as a c a + 
b c b. On the right, we see that the parallelogram formed by a and c is a sheared version of 
the parallelogram formed by b c b and a. 


We can see from Figure 5.6 that 


|(6 c b)a| = |ca|, 


because these parallelograms are just sheared versions of each other. Solving for 
b c yields 


b c = 


jca| 
|ba|' 


An analogous argument yields 


CL c — 


l bc l 

|ba|' 


This is the two-dimensional version of Cramer’s rule which we will revisit in 
Section 5.3.2. 


5.2 Matrices 

A matrix is an array of numeric elements that follow certain arithmetic rules. An 
example of a matrix with two rows and three columns is 

'1.7 -1.2 4.2' 

3.0 4.5 -7.2 ■ 
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Matrices are frequently used in computer graphics for a variety of purposes in¬ 
cluding representation of spatial transforms. For our discussion, we assume the 
elements of a matrix are all real numbers. This chapter describes both the mechan¬ 
ics of matrix arithmetic and the determinant of “square” matrices, i.e., matrices 
with the same number of rows as columns. 


5.2.1 Matrix Arithmetic 


A matrix times a constant results in a matrix where each element has been multi¬ 
plied by that constant, e.g., 


1 

—4 


'2 

1 

00 

3 

2 


6 

4 


Matrices also add element by element, e.g.. 


1 -4 
3 2 


2 2 
2 2 


3 -2 
5 4 


For matrix multiplication, we “multiply” rows of the first matrix with columns of 
the second matrix: 


an ■ • • 0-ln 


CtH . . . (li 


bn ... 

bml • • - 




1 J mj 


bic 

• • • bm.( 


P 11 ■■■ Plj ■■■ Pic 


Pil 


\-Prl 


Pij 


Ctrl ■ • ■ drm 

So the element pij of the resulting product is 

Pij ~ aahj o. t '2b‘2j -(- • • • "F a lrn b m j ■ 


Prj 


Pic 


Prc _ 


( 5 . 2 ) 


Taking a product of two matrices is only possible if the number of columns of the 
left matrix is the same as the number of rows of the right matrix. For example. 


0 1 
2 3 
4 5 


6 

0 


7 8 9 
12 3 


0 12 3 

12 17 22 27 
24 33 42 51 


Matrix multiplication is not commutative in most instances: 


AB ^ BA. 


(5.3) 
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Also, if AB = AC, it does not necessarily follow that B = C. Fortunately, 
matrix multiplication is associative and distributive: 

(AB)C = A(BC), 

A(B + C) = AB + AC, 

(A + B)C = AC + BC. 


5.2.2 Operations on Matrices 


We would like a matrix analog of the inverse of a real number. We know the 
inverse of a real number x is 1/x and that the product of x and its inverse is 1. 
We need a matrix I that we can think of as a “matrix one.” This exists only for 
square matrices and is known as the identity matrix ; it consists of ones down the 
diagonal and zeroes elsewhere. For example, the four by four identity matrix is 


10 0 0 
0 10 0 
0 0 10 
0 0 0 1 


The inverse matrix A 1 of a matrix A is the matrix that ensures A A 1 = I. 
For example. 


1 

2 

-1 

—2.0 

1.0' 

because 

1 

2' 


'-2.0 

1.0' 


'1 

o' 



— 



— 

3 

4 


1.5 

-0.5 

3 

4 


1.5 

-0.5 


0 

1 


Note that the inverse of A 1 is A. So A A 1 = A 1 A = I. The inverse of a 
product of two matrices is the product of the inverses, but with the order reversed: 

(AB)" 1 = B _1 A _1 . (5.4) 


We will return to the question of computing inverses later in the chapter. 

The transpose A 1 of a matrix A has the same numbers but the rows are 
switched with the columns. If we label the entries of A T as a', then 

t J 

a ij = a ji- 


3 4 
5 6 


13 5 
2 4 6 


For example. 
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The transpose of a product of two matrices obeys a rule similar to Equation (5.4): 

(AB) t = b t a t 

The determinant of a square matrix is simply the determinant of the columns 
of the matrix, considered as a set of vectors. The determinant has several nice 
relationships to the matrix operations just discussed, which we list here for refer¬ 
ence: 


|AB| = 
[A- 1 ! 

A t 


|A| |B| 
1 

= |A| 


(5.5) 

(5.6) 

(5.7) 


5.2.3 Vector Operations in Matrix Form 

In graphics, we use a square matrix to transform a vector represented as a matrix. 
For example if you have a 2D vector a = ( x a , y„ ) and want to rotate it by 90 
degrees about the origin to form vector a' = (— y a , x a ), you can use a product of 
a 2 x 2 matrix and a 2 x 1 matrix, called a column vector. The operation in matrix 
form is 


o 

-1 


Xa 


Va 

1 

0 


Va 


Xa_ 


We can get the same result by using the transpose of this matrix and multiplying 
on the left (“premultiplying”) with a row vector: 

oi r i 

10 = [-V* X A ■ 

These days, postmultiplication using column vectors is fairly standard, but in 
many older books and systems you will run across row vectors and premulti¬ 
plication. The only difference is that the transform matrix must be replaced with 
its transpose. 

We can use also matrix formalism to encode operations on just vectors. If we 
consider the result of the dot product as a 1 x 1 matrix, it can be written 

a ■ b = a T b. 



For example, if we take two 3D vectors we get 


\_Xa Va Za\ 


= [x a x b + y a yb + z a z b \ . 
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A related vector product is the outer product between two vectors, which can 
be expressed as a matrix multiplication with a column vector on the left and a row 
vector on the right: ab T . The result is a matrix consisting of products of all pairs 
of an entry of a with an entry of b. For 3D vectors, we have 


X a 

[x b y b z b \ = 

X a x b 

XaVb 

Xa.Z b 

lla 

y a x b 

y a y b 

y a z b 

_Z a 


ZaX b 

z a y b 

z a z b _ 


It is often useful to think of matrix multiplication in terms of vector operations. 
To illustrate using the three-dimensional case, we can think of a 3 x 3 matrix as 
a collection of three 3D vectors in two ways: either it is made up of three column 
vectors side-by-side, or it is made up of three row vectors stacked up. For instance, 
the result of a matrix-vector multiplication y = Ax can be interpreted as a vector 
whose entries are the dot products of x with the rows of A. Naming these row 
vectors r,, we have 

T 

y = 

J. 

y 

Alternatively, we can think of the same product as a sum of the three columns c,; 
of A, weighted by the entries of x: 


r 


'i i r 


Xi 

y 

- 

Cl c 2 c 3 


X 2 

1 . 


J 1 I. 


X 3 _ 


y = xici + x 2 c 2 + x 3 c 3 . 


Using the same ideas, one can understand a matrix-matrix product AB as an 
array containing the pairwise dot products of all rows of A with all columns of B 
(cf. (5.2)); as a collection of products of the matrix A with all the column vectors 
of B, arranged left to right; as a collection of products of all the row vectors of 
A with the matrix B, stacked top to bottom; or as the sum of the pairwise outer 
products of all columns of A with all rows of B. (See Exercise 8.) 

These interpretations of matrix multiplication can often lead to valuable geo¬ 
metric interpretations of operations that may otherwise seem very abstract. 

5.2.4 Special Types of Matrices 

The identity matrix is an example of a diagonal matrix, where all non-zero ele¬ 
ments occur along the diagonal. The diagonal consists of those elements whose 
column index equals the row index counting from the upper left. 


— IT — 


T 

— r 2 — 


X 

_ r 3 _ 


J. 


i = • x. 
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The idea of an orthogonal 
matrix corresponds to the 
idea of an orthonormal ba¬ 
sis, not just a set of orthog¬ 
onal vectors—an unfortu¬ 
nate glitch in terminology. 


The identity matrix also has the property that it is the same as its transpose. 
Such matrices are called symmetric. 

The identity matrix is also an orthogonal matrix, because each of its columns 
considered as a vector has length 1 and the columns are orthogonal to one another. 
The same is true of the rows (see Exercise 2). The determinant of any orthogonal 
matrix is either +1 or —1. 

A very useful property of orthogonal matrices is that they are nearly their own 
inverses. Multiplying an orthogonal matrix by its transpose results in the identity, 

R t R = I = RR r for orthogonal R. 

This is easy to see because the entries of R r R are dot products between the 
columns of R. Off-diagonal entries are dot products between orthogonal vec¬ 
tors, and the diagonal entries are dot products of the (unit-length) columns with 
themselves. 

Example. The matrix 

8 0 O' 

0 2 0 
0 0 9 

is diagonal, and therefore symmetric, but not orthogonal (the columns are orthog¬ 
onal but they are not unit length). 

The matrix 

'1 1 2 
19 7 
2 7 1 

is symmetric, but not diagonal or orthogonal. 

The matrix 

'0 1 O' 

0 0 1 
10 0 

is orthogonal,but neither diagonal nor symmetric. 


5.3 Computing with Matrices and Determinants 

Recall from Section 5.1 that the determinant takes n n-dimensional vectors and 
combines them to get a signed n-dimensional volume of the n-dimensional par¬ 
allelepiped defined by the vectors. For example, the determinant in 2D is the area 
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of the parallelogram formed by the vectors. We can use matrices to handle the 
mechanics of computing determinants. 

If we have 2D vectors r and s, we denote the determinant |rs|; this value is 
the signed area of the parallelogram formed by the vectors. Suppose we have 
two 2D vectors with Cartesian coordinates (a, b) and ( A,B ) (Figure 5.7). The 
determinant can be written in terms of column vectors or as a shorthand: 


a 


A 


a 

A 

[b 


B 


b 

B 


= aB — Ab. 


(5.8) 


Note that the determinant of a matrix is the same as the determinant of its trans¬ 
pose: 

.= aB — Ab. 


a 

A 


a 

b 

b 

B 


A 

B 


This means that for any parallelogram in 2D there is a “sibling” parallelogram that 
has the same area but a different shape (Figure 5.8). For example the parallelo¬ 
gram defined by vectors (3,1) and (2,4) has area 10, as does the parallelogram 
defined by vectors (3, 2) and (1,4). 


Example. The geometric meaning of the 3D determinant is helpful in seeing why 
certain formulas make sense. For example, the equation of the plane through the 
points (xi, yi, Zi ) for i = 0,1,2 is 


x — Xq x — X\ 

y~yo y-yi 


X - X2 

y~y 2 


= o. 


z — zq z — z\ z — z-i 


Each column is a vector from point (a Zi) to point (x, y, z). The volume of 
the parallelepiped with those vectors as sides is zero only if (x, y, z ) is coplanar 
with the three other points. Almost all equations involving determinants have 
similarly simple underlying geometry. 


As we saw earlier, we can compute determinants by a brute force expansion 
where most terms are zero, and there is a great deal of bookkeeping on plus and 
minus signs. The standard way to manage the algebra of computing determinants 
is to use a form of Laplace’s expansion. The key part of computing the determi¬ 
nant this way is to find cofactors of various matrix elements. Each element of a 
square matrix has a cofactor which is the determinant of a matrix with one fewer 
row and column possibly multiplied by minus one. The smaller matrix is obtained 
by eliminating the row and column that the element in question is in. For exam¬ 
ple, for a 10 x 10 matrix, the cofactor of as 2 is the determinant of the 9x9 matrix 
with the 8th row and 2nd column eliminated. The sign of a cofactor is positive if 


n 



Figure 5.7. The 2D de¬ 
terminant in Equation 5.8 is 
the area of the parallelo¬ 
gram formed by the 2D vec¬ 
tors. 





Figure 5.8. The sibling 
parallelogram has the same 
area as the parallelogram in 
Figure 5.7. 
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the sum of the row and column indices is even and negative otherwise. This can 
be remembered by a checkerboard pattern: 


So, for a 4 x 4 matrix. 



On 

012 

Ol3 

Ol4 

021 

o 22 

023 

024 

031 

O32 

033 

034 

041 

O42 

043 

044 


The cofactors of the first row are 



022 

023 

024 


021 

a 23 

024 

II 

OH 

e 

«32 

«33 

034 

, a\ 2 = - 

031 

033 

O34 


042 

043 

044 


041 

043 

044 


021 

022 

024 


021 

«22 

023 

«13 = 

031 

032 

034 

, 0f 4 = - 

031 

a 32 

033 


041 

042 

044 


041 

042 

043 


The determinant of a matrix is found by taking the sum of products of the elements 
of any row or column with their cofactors. For example, the determinant of the 
4x4 matrix above taken about its second column is 


|A| — 0 12^12 4“ 022^22 4” O 32 O 32 4“ O 42 O 42 . 

We could do a similar expansion about any row or column and they would all 
yield the same result. Note the recursive nature of this expansion. 

Example. A concrete example for the determinant of a particular 3x3 matrix by 
expanding the cofactors of the first row is 

0 12 
3 4 5 
6 7 8 

= 0(32 - 35) - 1(24 - 30) + 2(21 - 24) 

= 0 . 


= 0 


4 5 
7 8 


- 1 


3 5 
6 8 


3 4 
6 7 
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We can deduce that the volume of the parallelepiped formed by the vectors 
defined by the columns (or rows since the determinant of the transpose is the 
same) is zero. This is equivalent to saying that the columns (or rows) are not 
linearly independent. Note that the sum of the first and third rows is twice the 
second row, which implies linear dependence. 


5.3.1 Computing Inverses 

Determinants give us a tool to compute the inverse of a matrix. It is a very inef¬ 
ficient method for large matrices, but often in graphics our matrices are small. A 
key to developing this method is that the determinant of a matrix with two iden¬ 
tical rows is zero. This should be clear because the volume of the ^-dimensional 
parallelepiped is zero if two of its sides are the same. Suppose we have a 4 x 4 A 
and we wish to find its inverse A -1 . The inverse is 



~a c n 

«21 

«31 

ah 

1 

a 12 

ah 

ah 

«42 

]A| 

a 13 

a 23 

a 33 

a 43 


_ a 14 

a 24 

a 34 

a 44 


Note that this is just the transpose of the matrix where elements of A are replaced 
by their respective cofactors multiplied by the leading constant (1 or -1). This 
matrix is called the adjoint of A. The adjoint is the transpose of the cofactor 
matrix of A. We can see why this is an inverse. Look at the product AA _1 
which we expect to be the identity. If we multiply the first row of A by the first 
column of the adjoint matrix we need to get |A| (remember the leading constant 
above divides by |A|: 


an ai2 «13 ai4 


0) i—i 

<3 


"a • 




a c n ■ ■ ■ 






a 13 






'sf* 

OH 

<3 





This is true because the elements in the first row of A are multiplied exactly 
by their cofactors in the first column of the adjoint matrix which is exactly the 
determinant. The other values along the diagonal of the resulting matrix are | A| 
for analogous reasons. The zeros follow a similar logic: 




a il 




021 022 023 024 


a 12 


0 • 




a 13 






OH 

<3 
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Note that this product is a determinant of some matrix: 

0-21011 + O22012 + O230J3 + 024&14- 


The matrix in fact is 


«21 

022 

023 

024 

021 

a 22 

023 

024 

031 

a 32 

033 

034 

041 

042 

043 

044. 


Because the first two rows are identical, the matrix is singular, and thus, its deter¬ 
minant is zero. 

The argument above does not apply just to four by four matrices; using that 
size just simplifies typography. For any matrix, the inverse is the adjoint matrix 
divided by the determinant of the matrix being inverted. The adjoint is the trans¬ 
pose of the cofactor matrix, which is just the matrix whose elements have been 
replaced by their cofactors. 


Example. The inverse of one particular three by three matrix whose determinant 
is 6 is 


112 
13 4 
0 2 5 




3 

4 


1 

2 


1 

2 



2 

5 


2 

5 


3 

4 

1 


1 

4 


1 

2 


1 

2 

6 


0 

5 


0 

5 


1 

4 



1 

3 


1 

1 


1 

1 



0 

2 


0 

2 


1 

3 


1 

6 


7 

-5 

2 


-1 -2 
5 -2 
-2 2 


You can check this yourself by multiplying the matrices and making sure you get 
the identity. 


5.3.2 Linear Systems 

We often encounter linear systems in graphics with “n equations and n unknowns,” 
usually for n = 2 or n = 3. For example, 

3x + 7y + 2z = 4, 

2x — Ay — 3z = —1, 

5x + 2y + z = 1. 
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Here x, y, and z are the “unknowns” for which we wish to solve. We can write 
this in matrix form: 


'3 

7 2 


X 


4 

2 

5 

-4 -3 
2 1 


y 

z 

— 

- 1 

1 


A common shorthand for such systems is Ax = b where it is assumed that A is 
a square matrix with known constants, x is an unknown column vector (with ele¬ 
ments x, y, and z in our example), and b is a column matrix of known constants. 

There are many ways to solve such systems, and the appropriate method de¬ 
pends on the properties and dimensions of the matrix A. Because in graphics 
we so frequently work with systems of size n < 4, we’ll discuss here a method 
appropriate for these systems, known as Cramer’s rule, which we saw earlier, 
from a 2D geometric viewpoint, in the example on page 92. Here, we show this 
algebraically. The solution to the above equation is 



The rule here is to take a ratio of determinants, where the denominator is |A| and 
the numerator is the determinant of a matrix created by replacing a column of A 
with the column vector b. The column replaced corresponds to the position of 
the unknown in vector x. For example, y is the second unknown and the second 
column is replaced. Note that if jA| = 0, the division is undefined and there is 
no solution. This is just another version of the rule that if A is singular (zero 
determinant) then there is no unique solution to the equations. 


5.4 Eigenvalues and Matrix Diagonalization 

Square matrices have eigenvalues and eigenvectors associated with them. The 
eigenvectors are those non-zero vectors whose directions do not change when 
multiplied by the matrix. For example, suppose for a matrix A and vector a, we 
have 

Aa = Aa. (5.9) 

This means we have stretched or compressed a, but its direction has not changed. 
The scale factor A is called the eigenvalue associated with eigenvector a. Knowing 
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5. Linear Algebra 


Recall that an orthogo¬ 
nal matrix has orthonor¬ 
mal rows and orthonormal 
columns. 


the eigenvalues and eigenvectors of matrices is helpful in a variety of practical 
applications. We will describe them to gain insight into geometric transformation 
matrices and as a step toward singular values and vectors described in the next 
section. 

If we assume a matrix has at least one eigenvector, then we can do a standard 
manipulation to find it. First, we write both sides as the product of a square matrix 
with the vector a: 

Aa = Ala, (5.10) 

where I is an identity matrix. This can be rewritten 

Aa — Ala = 0. (5.11) 

Because matrix multiplication is distributive, we can group the matrices: 

(A — AI) a = 0. (5.12) 

This equation can only be true if the matrix (A — AI) is singular, and thus its 
determinant is zero. The elements in this matrix are the numbers in A except 
along the diagonal. For example, for a 2 x 2 matrix the eigenvalues obey 

= A 2 — (an + 022 )-^ + (011022 — GT 2 Q 21 ) = 0- (5.13) 

Because this is a quadratic equation, we know there are exactly two solutions for 
A. These solutions may or may not be unique or real. A similar manipulation 
for annxn matrix will yield an nth-degree polynomial in A. Because it is not 
possible, in general, to find exact explicit solutions of polynomial equations of 
degree greater than four, we can only compute eigenvalues of matrices 4 x 4 or 
smaller by analytic methods. For larger matrices, numerical methods are the only 
option. 

An important special case where eigenvalues and eigenvectors are particu¬ 
larly simple is symmetric matrices (where A = A r ). The eigenvalues of real 
symmetric matrices are always real numbers, and if they are also distinct, their 
eigenvectors are mutually orthogonal. Such matrices can be put into diagonal 
fornr. 

A = QDQ t , (5.14) 

where Q is an orthogonal matrix and D is a diagonal matrix. The columns of Q 
are the eigenvectors of A and the diagonal elements of D are the eigenvalues of 
A. Putting A in this form is also called the eigenvalue decomposition, because it 
decomposes A into a product of simpler matrices that reveal its eigenvectors and 
eigenvalues. 


ail — A 012 

021 O 22 — A 
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Example. Given the matrix 


A = 


2 1 
1 1 


the eigenvalues of A are the solutions to 

A 2 - 3A + 1 = 0. 

We approximate the exact values for compactness of notation: 

3± V5 


A = 


2.618 

0.382 



X 


'o' 


y_ 


0 


Now we can find the associated eigenvector. The first is the nontrivial (not x = 
y = 0) solution to the homogeneous equation, 

2 - 2.618 1 
1 1-2.618 

This is approximately ( x,y ) = (0.8507,0.5257). Note that there are infinitely 
many solutions parallel to that 2D vector, and we just picked the one of unit length. 
Similarly the eigenvector associated with A 2 is ( x , y) = (—0.5257,0.8507). This 
means the diagonal form of A is (within some precision due to our numeric ap¬ 
proximation): 


'2 1' 


'0.8507 -0.5257' 


'2.618 0 


0.8507 0.5257' 

1 1 


0.5257 0.8507 


0 0.382 


-0.5257 0.8507 


We will revisit the geometry of this matrix as a transform in the next chapter. 

5.4.1 Singular Value Decomposition 


We saw in the last section that any symmetric matrix can be diagonalized, or de¬ 
composed into a convenient product of orthogonal and diagonal matrices. How¬ 
ever, most matrices we encounter in graphics are not symmetric, and the eigen¬ 
value decomposition for non-symmetric matrices is not nearly so convenient or 
illuminating, and in general involves complex-valued eigenvalues and eigenvec¬ 
tors even for real-valued inputs. 

There is another generalization of the symmetric eigenvalue decomposition to 
non-symmetric (and even non-square) matrices; it is the singular value decom¬ 
position (SVD). The main difference between the eigenvalue decomposition of a 
symmetric matrix and the SVD of a non-symmetric matrix is that the orthogonal 
matrices on the left and right sides are not required to be the same in the SVD: 

A = USV T . 


We would recommend 
learning in this order: sym¬ 
metric eigenvalues/vectors, 
singular values/vectors, 
and then unsymmetric 
eigenvalues, which are 
much trickier. 
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Here U and V are two, potentially different, orthogonal matrices, whose columns 
are known as the left and right singular vectors of A, and S is a diagonal matrix 
whose entries are known as the singular values of A. When A is symmetric and 
has all non-negative eigenvalues, the SVD and the eigenvalue decomposition are 
the same. 

There is another relationship between singular values and eigenvalues that 
can be used to compute the SVD (though this is not the way an industrial-strength 
SVD implementation works). First we define M = AA 1 . We assume that we 
can perform a SVD on M: 

M = AA t = (USV T )(USV T ) T = US(V T V)SU T = US 2 U T . 

The substitution is based on the fact that (BC) T = C T B T , that the transpose 
of an orthogonal matrix is its inverse, and the transpose of a diagonal matrix 
is the matrix itself. The beauty of this new form is that M is symmetric and 
US 2 U T is its eigenvalue decomposition, where S 2 contains the (all non-negative) 
eigenvalues. Thus, we find that the singular values of a matrix are the square roots 
of the eigenvalues of the product of the matrix with its transpose, and the left 
singular vectors are the eigenvectors of that product. A similar argument allows 
V, the matrix of right singular vectors, to be computed from A 1 A. 


Example. We now make this concrete with an example: 


A = 


1 

0 


M = AA t 


2 1 
1 1 


We saw the eigenvalue decomposition for this matrix in the previous section. We 
observe immediately 


V 


'1 1' 


'0.8507 

—0.5257" 

V2M8 

0 

0 1 


0.5257 

0.8507 

0 

V0.382 


We can solve for V algebraically: 

V = (S~ 1 U t M) t . 


The inverse of S is a diagonal matrix with the reciprocals of the diagonal elements 
of S. This yields 


1 

0 


= U 


o-i 

0 


0 

02 


V T 


'0.8507 

-0.5257 


'1.618 

0 


0.5257 

0.8507' 

0.5257 

0.8507 


0 

0.618 


-0.8507 

0.5257 
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This form used the standard symbol a, for the /th singular value. Again, for a 
symmetric matrix, the eigenvalues and the singular values are the same (<7; = AJ. 
We will examine the geometry of SVD further in Section 6.1.6. 

Frequently Asked Questions 

• Why is matrix multiplication defined the way it is rather than just element 
by element? 

Element by element multiplication is a perfectly good way to define matrix mul¬ 
tiplication, and indeed it has nice properties. However, in practice it is not very 
useful. Ultimately most matrices are used to transform column vectors, e.g., in 
3D you might have 

b = Ma, 

where a and b are vectors and M is a 3 x 3 matrix. To allow geometric operations 
such as rotation, combinations of all three elements of a must go into each element 
of b. That requires us to either go row-by-row or column-by-column through M. 
That choice is made based on composition of matrices having the desired property, 

M 2 (Mia) = (M 2 Mi)a 

which allows us to use one composite matrix C = M 2 Mi to transform our vector. 
This is valuable when many vectors will be transformed by the same composite 
matrix. So, in summary, the somewhat weird rule for matrix multiplication is en¬ 
gineered to have these desired properties. 

• Sometimes I hear that eigenvalues and singular values are the same 
thing and sometimes that one is the square of the other. Which is right? 

If a real matrix A is symmetric, and its eigenvalues are non-negative, then its 
eigenvalues and singular values are the same. If A is not symmetric, the ma¬ 
trix M = AA 1 is symmetric and has non-negative real eignenvalues. The sin¬ 
gular values of A and A r are the same and are the square roots of the singu¬ 
lar/eigenvalues of M. Thus, when the square root statement is made, it is because 
two different matrices (with a very particular relationship) are being talked about: 
M = AA t . 


Notes 

The discussion of determinants as volumes is based on A Vector Space Approach 
to Geometry (Hausner, 1998). Hausner has an excellent discussion of vector 
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analysis and the fundamentals of geometry as well. The geometric derivation 
of Cramer’s rule in 2D is taken from Practical Linear Algebra: A Geometry Tool¬ 
box (Farin & Hansford, 2004). That book also has geometric interpretations of 
other linear algebra operations such as Gaussian elimination. The discussion of 
eigenvalues and singular values is based primarily on Linear Algebra and Its Ap¬ 
plications (Strang, 1988). The example of SVD of the shear matrix is based on a 
discussion in Computer Graphics and Geometric Modeling (Salomon, 1999). 


Exercises 

1. Write an implicit equation for the 2D line through points (rco,yo) and 
(ari, 2 / 1 ) using a 2D determinant. 

2. Show that if the columns of a matrix are orthonormal, then so are the rows. 

3. Prove the properties of matrix determinants stated in Equations (5.5)-(5.7). 

4. Show that the eigenvalues of a diagonal matrix are its diagonal elements. 

5. Show that for a square matrix A, A A 1 is a symmetric matrix. 

6. Show that for three 3D vectors a, b, c, the following identity holds: |abc| = 
(a x b) • c. 

7. Explain why the volume of the tetrahedron with side vectors a, b, c (see 
Figure 5.2) is given by |abc|/6. 

8. Demonstrate the four interpretations of matrix-matrix multiplication by tak¬ 
ing the following matrix-matrix multiplication code, rearranging the nested 
loops, and interpreting the resulting code in terms of matrix and vector op¬ 
erations. 

function mat-mult(in a[m][p], in b[p][n], out c[m][n]) { 
// the array c is initialized to zero 
for i = 1 to m 
for j = 1 to n 
for k = 1 to p 

c[i][j] += a[i][k] * b[k][j] 

} 

9. Prove that if A, Q, and D satisfy Equation (5.14), v is the 7th row of Q, 
and A is the 7th entry on the diagonal of D, then v is an eigenvector of A 
with eigenvalue A. 
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10. Prove that if A, Q, and D satisfy Equation (5.14), the eigenvalues of A are 
all distinct, and v is an eigenvector of A with eigenvalue A, then for some 
i, v is the ith row of Q and A is the ith entry on the diagonal of D. 

11. Given the (x, y) coordinates of the three vertices of a 2D triangle, explain 
why the area is given by 

1 Xq X\ x 2 

^ yo yi z /2 ■ 

z 1 1 1 
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Transformation Matrices 


The machinery of linear algebra can be used to express many of the operations 
required to arrange objects in a 3D scene, view them with cameras, and get them 
onto the screen. Geometric transformations like rotation, translation, scaling, and 
projection can be accomplished with matrix multiplication, and the transforma¬ 
tion matrices used to do this are the subject of this chapter. 

We will show how a set of points transforms if the points are represented as 
offset vectors from the origin, and we will use the clock shown in Figure 6.1 as 
an example of a point set. So think of the clock as a bunch of points that are the 
ends of vectors whose tails are at the origin. We also discuss how these transforms 
operate differently on locations (points), displacement vectors, and surface normal 
vectors. 


6.1 2D Linear Transformations 

We can use a 2 x 2 matrix to change, or transform, a 2D vector: 


a li 

dl2 


X 


anx + a 12 y 


a 22 


y. 


a 2 ix + a 22 y_ 


This kind of operation, which takes in a 2-vector and produces another 2-vector 
by a simple matrix multiplication, is a linear transformation. 

By this simple formula we can achieve a variety of useful transformations, 
depending on what we put in the entries of the matrix, as will be discussed in 
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the following sections. For our purposes, consider moving along the a;-axis a 
horizontal move and along the y-axis, a vertical move. 


6.1.1 Scaling 


The most basic transform is a scale along the coordinate axes. This transform can 
change length and possibly direction: 

scaled, Sj,) = ^ ° . 


Note what this matrix does to a vector with Cartesian components ( x , y ): 



o' 


X 


S x X 

0 

s y_ 


y. 


_s y y_ 


So just by looking at the matrix of an axis-aligned scale we can read off the two 
scale factors. 


Example. The matrix that shrinks x and y uniformly by a factor of two is (Fig¬ 
ure 6.1) 

scale(0.5,0.5) = 0 Q 5 Q ° 5 . 

A matrix which halves in the horizontal and increases by three-halves in the ver¬ 
tical is (see Figure 6.2) 


scale(0.5,1.5) 


0.5 0 

0 1.5 



Figure 6.1. Scaling uniformly by half for each axis: The axis-aligned scale matrix has 
the proportion of change in each of the diagonal elements and zeroes in the off-diagonal 
elements. 
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Figure 6.2. Scaling non-uniformly in xand y: The scaling matrix is diagonal with non-equal 
elements. Note that the square outline of the clock becomes a rectangle and the circular 
face becomes an ellipse. 


6.1.2 Shearing 

A shear is something that pushes things sideways, producing something like a 
deck of cards across which you push your hand; the bottom card stays put and 
cards move more the closer they are to the top of the deck. The horizontal and 
vertical shear matrices are 

shear-x(s) = 

Example. The transform that shears horizontally so that vertical lines become 45° 
lines leaning towards the right is (see Figure 6.3) 

shear-x(l) = ^ J . 


, shear-y(s) = 


1 0 
s 1 



Figure 6.3. An x-shear matrix moves points to the right in proportion to their y-coordinate. 
Now the square outline of the clock becomes a parallelogram and, as with scaling, the circular 
face of the clock becomes an ellipse. 
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Figure 6.4. A y-shear matrix moves points up in proportion to their x-coordinate. 


An analogous transform vertically is (see Figure 6.4) 
shear-y(l) = j ^ . 


In fact, the image of a cir¬ 
cle under any matrix trans¬ 
formation is an ellipse. 


In both cases the square outline of the sheared clock becomes a parallelogram, 
and the circular face of the sheared clock becomes an ellipse. 

Another way to think of a shear is in terms of rotation of only the vertical 
(or horizontal) axes. The shear transform that takes a vertical axis and tilts it 
clockwise by an angle (f> is 

1 tan (f> 

0 1 ‘ 

Similarly, the shear matrix which rotates the horizontal axis counterclockwise by 
angle cj) is 

1 O' 
tan (j> 1 


6.1.3 Rotation 

Suppose we want to rotate a vector a by an angle (!) counterclockwise to get 
vector b (Figure 6.5). If a makes an angle a with the i-axis, and its length is 
r = x 2 a + y 2 a , then we know that 


x a = r cos a, 
Ua = rsin a. 
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Because b is a rotation of a, it also has length r. Because it is rotated an angle 
(j> from a, b makes an angle (a + <j>) with the at-axis. Using the trigonometric 
addition identities (Section 2.3.3): 

Xb = r cos(a + d>) = r cos a cos 6 — r sin a sin 6, 

( 6 - 1 ) 

yb = r sin(a + </>)= r sin a cos <f> + r cos a sin </>. 

Substituting x a = r cos a and y a = r sin a gives 

x b = x a cos <t> — y a sin (/>, 

Ub = y a cos (f> + x a sin 4>. 

In matrix form, the transformation that takes a to b is then 


rotate((/>) 


cos <f> — sin <fi 
sin <p cos (f> 


Example. A matrix that rotates vectors by 7t/4 radians (45 degrees) is (see Fig¬ 
ure 6.6) 


COS f 

-sinf' 


'0.707 

-0.707' 

sinf 

cosf 


0.707 

0.707 



Figure 6.6. A rotation by 45 degrees. Note that the rotation is counterclockwise and that 
cos(45°) = sin(45°) ~ .707. 


A matrix that rotates by 7r/6 radians (30 degrees) in the clockwise direction is 
a rotation by —n/6 radians in our framework (see Figure 6.7): 


COS ~^ L 

— sin -fp 


0.866 

0.5 

sin 

L b 

cos 

b J 


-0.5 

0.866 



Figure 6.5. The geometry 
for Equation (6.1). 
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Figure 6.7. A rotation by minus thirty degrees. Note that the rotation is clockwise and that 
cos(-30°) ~ .866 and sin(-30°) = -.5. 


Said briefly, Re; = u, and 
RVj = u it for a rotation with 
columns u, and rows v,. 


Because the norm of each row of a rotation matrix is one (sin 2 (f>+ cos 2 (j> = 1), 
and the rows are orthogonal (cos cj)(— sin cj>) + sin</>cos 4> = 0), we see that ro¬ 
tation matrices are orthogonal matrices (Section 5.2.4). By looking at the matrix 
we can read off two pairs of orthonormal vectors: the two columns, which are the 
vectors to which the transformation sends the canonical basis vectors (1,0) and 
(0,1); and the rows, which are the vectors that the transformations sends to the 
canonical basis vectors. 


6.1.4 Reflection 

We can reflect a vector across either of the coordinate axes by using a scale with 
one negative scale factor (see Figures 6.8 and 6.9): 


reflect-y = 


-1 0 

0 1 


reflect-x = 


1 0 
0 -1 



Figure 6.8. A reflection about the y-axis is achieved by multiplying all x-coordinates by -1. 
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Figure 6.9. A reflection about the x-axis is achieved by multiplying all y-coordinates by -1. 


While one might expect that the matrix with —1 in both elements of the diagonal 
is also a reflection, in fact it is just a rotation by n radians. 


This rotation can also be 
called a “reflection through 
the origin.” 


6.1.5 Composition and Decomposition of Transformations 

It is common for graphics programs to apply more than one transformation to an 
object. For example, we might want to first apply a scale S, and then a rotation 
R. This would be done in two steps on a 2D vector vp 

first,V2 = Svi, then,V3 = Rv2. 

Another way to write this is 


v 3 = R(Sv 1 ). 


Because matrix multiplication is associative, we can also write 


v 3 = (RS) vi. 


In other words, we can represent the effects of transforming a vector by two ma¬ 
trices in sequence using a single matrix of the same size, which we can compute 
by multiplying the two matrices: M = RS (Figure 6.10). 

It is very important to remember that these transforms are applied from the 
right side first. So the matrix M = RS first applies S and then R. 
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.707 -.3531 
.707 .353 


Figure 6.10. Applying the two transform matrices in sequence is the same as applying the 
product of those matrices once. This is a key concept that underlies most graphics hardware 
and software. 


Example. Suppose we want to scale by one-half in the vertical direction and then 
rotate by 7r/4 radians (45 degrees). The resulting matrix is 


'0.707 -0.707' 


'1 0 ' 


'0.707 -0.353' 

0.707 0.707 


0 0.5 


0.707 0.353 


It is important to always remember that matrix multiplication is not commutative. 
So the order of transforms does matter. In this example, rotating first, and then 
scaling, results in a different matrix (see Figure 6.11): 


'1 

0 ' 


'0.707 

-0.707 

0 

0.5 


0.707 

0.707 


0.707 

0.353 


-0.707 

0.353 


Example. Using the scale matrices we have presented, nonuniform scaling can 
only be done along the coordinate axes. If we wanted to stretch our clock by 
50% along one of its diagonals, so that 8:00 through 1:00 move to the northwest 
and 2:00 through 7:00 move to the southeast, we can use rotation matrices in 
combination with an axis-aligned scaling matrix to get the result we want. The 
idea is to use a rotation to align the scaling axis with a coordinate axis, then 
scale along that axis, then rotate back. In our example, the scaling axis is the 
“backslash” diagonal of the square, and we can make it parallel to the x-axis with 
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Figure 6.11. The order in which two transforms are applied is usually important. In this 
example, we do a scale by one-half in y and then rotate by 45°. Reversing the order in which 
these two transforms are applied yields a different result. 


a rotation by +45°. Putting these operations together, the full transformation is 


rotate(—45°) scale(1.5,1) rotate(45°). 


Remember to read the 
transformations from right 
to left. 


In mathematical notation, this can be written RSR T . The result of multiply¬ 
ing the three matrices together is 


1.25 -0.25 

-0.25 1.25 


It is no coincidence that 
this matrix is symmetric- 
try applying the transpose- 
of-product rule to the for¬ 
mula RSR t . 
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Building up a transformation from rotation and scaling transformations actu¬ 
ally works for any linear transformation at all, and this fact leads to a powerful 
way of thinking about these transformations, as explored in the next section. 


6.1.6 Decomposition of Transformations 

Sometimes it’s necessary to “undo” a composition of transformations, taking a 
transformation apart into simpler pieces. For instance, it’s often useful to present 
a transformation to the user for manipulation in terms of separate rotations and 
scale factors, but a transformation might be represented internally simply as a 


/ff? 

X 

r° I 

2\ 

1 9 4- 

— 3 ] 

Vs 

4 / 

V£6. 

5 / 



y 

scale (1.618, 0.618 ) 



Figure 6.12. Singular Value Decomposition (SVD) for a shear matrix. Any 2D matrix can 
be decomposed into a product of rotation, scale, rotation. Note that the circular face of the 
clock must become an ellipse because it is just a rotated and scaled circle. 
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matrix, with the rotations and scales already mixed together. This kind of manip¬ 
ulation can be achieved if the matrix can be computationally disassembled into the 
desired pieces, the pieces adjusted, and the matrix reassembled by multiplying the 
pieces together again. 

It turns out that this decomposition, or factorization, is possible, regardless of 
the entries in the matrix—and this fact provides a fruitful way of thinking about 
transformations and what they do to geometry that is transformed by them. 


Symmetric Eigenvalue Decomposition 

Let’s start with symmetric matrices. Recall from Section 5.4 that a symmetric ma¬ 
trix can always be taken apart using the eigenvalue decomposition into a product 
of the form 

A = RSR t 


where R is an orthogonal matrix and S is a diagonal matrix; we will call the 
columns of R (the eigenvectors) by the names vi and V 2 , and we'll call the diag¬ 
onal entries of S (the eigenvalues) by the names Ai and A 2 . 

In geometric terms we can now recognize R as a rotation and S as a scale, so 
this is just a multi-step geometric transformation (Figure 6.13): 

1. Rotate vi and V 2 to the x- and y -axes (the transform by R T ). 

2. Scale in x and y by (Ai, A 2 ) (the transform by S). 

3. Rotate the x- and y-axes back to vi and V 2 (the transform by R). 

Looking at the effect of these three transforms together, we can see that they have 
the effect of a nonuniform scale along a pair of axes. As with an axis-aligned 
scale, the axes are perpendicular,but they aren’t the coordinate axes; instead they 


If you like to count di¬ 
mensions: a symmetric 2 
X 2 matrix has 3 de¬ 
grees of freedom, and the 
eigenvalue decomposition 
rewrites them as a rotation 
angle and two scale factors. 



Figure 6.13. What happens when the unit circle is transformed by an arbitrary symmetric 
matrix A, also known as a non-axis-aligned, nonuniform scale. The two perpendicular vec¬ 
tors v-i and v 2 , which are the eigenvectors of A, remain fixed in direction but get scaled. In 
terms of elementary transformations, this can be seen as first rotating the eigenvectors to 
the canonical basis, doing an axis-aligned scale, and then rotating the canonical basis back 
to the eigenvectors. 
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Figure 6.14. A symmetric matrix is always a scale along some axis. In this case it is along 
the <j> = 31.7° direction which means the real eigenvector for this matrix is in that direction. 


are the eigenvectors of A. This tells us something about what it means to be a 
symmetric matrix: symmetric matrices are just scaling operations—albeit poten¬ 
tially nonuniform and non-axis-aligned ones. 


Example. Recall the example from Section 5.4: 

'2 1 

1 1 


'0.8507 

-0.5257 


'2.618 

0 


0.8507 

0.5257' 

0.5257 

0.8507 


0 

0.382 


-0.5257 

0.8507 


R 


Ai 0 

0 a 2 


R 1 


= rotate (31.7°) scale (2.618,0.382) rotate (-31.7°). 


The matrix above, then, according to its eigenvalue decomposition, scales in a 
direction 31.7° counterclockwise from three o’clock (the x-axis). This is a touch 
before 2 p.m. on the clockface as is confirmed by Figure 6.14. 

We can also reverse the diagonalization process; to scale by (Ai, A 2 ) with the 
first scaling direction an angle <j> clockwise from the x-axis, we have 


cos cj> 

sin <f) 


Ai 

O' 


COS (j) 

— sin <j) 

— sin (f> 

COS (j) 


0 

A 2 


sin cj) 

COS (j) 


Ai cos 2 (j) + A 2 sin 2 </> (A 2 — Ai ) cos 4> sin cj> 

(A 2 — Ai) cos 0sin ^ A 2 cos 2 (j) + Ai sin 2 rj> 

We should take heart that this is a symmetric matrix as we know must be true 
since we constructed it from a symmetric eigenvalue decomposition. 
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Figure 6.15. What happens when the unit circle is transformed by an arbitrary matrix A. 
The two perpendicular vectors v-i and v 2 , which are the right singular vectors of A, get scaled 
and changed in direction to match the left singular vectors, Ui and u 2 . In terms of elementary 
transformations, this can be seen as first rotating the right singular vectors to the canonical 
basis, doing an axis-aligned scale, and then rotating the canonical basis to the left singular 
vectors. 


Singular Value Decomposition 

A very similar kind of decomposition can be done with non-symmetric matrices 
as well: it’s the Singular Value Decomposition (SVD), also discussed in Sec¬ 
tion 5.4.1. The difference is that the matrices on either side of the diagonal matrix 
are no longer the same: 

A = USV T 


The two orthogonal matrices that replace the single rotation R are called U and 
V, and their columns are called u, (the left singular vectors) and v, (the right 
singular vectors ), respectively. In this context, the diagonal entries of S are called 
singular values rather than eigenvalues. The geometric interpretation is very sim¬ 
ilar to that of the symmetric eigenvalue decomposition (Figure 6.15): 

1. Rotate vi and V 2 to the x- and y -axes (the transform by V T ). 

2. Scale in x and y by (cri, 0 - 2 ) (the transform by S). 

3. Rotate the x- and y -axes to Ui and u 2 (the transform by U). 

The principal difference is between a single rotation and two different orthogonal 
matrices. This difference causes another, less important, difference. Because the 
SVD has different singular vectors on the two sides, there is no need for neg¬ 
ative singular values: we can always flip the sign of a singular value, reverse 
the direction of one of the associated singular vectors, and end up with the same 
transformation again. For this reason, the SVD always produces a diagonal ma¬ 
trix with all positive entries, but the matrices U and V are not guaranteed to be 
rotations—they could include reflection as well. In geometric applications like 
graphics this is an inconvenience, but a minor one: it is easy to differentiate ro¬ 
tations from reflections by checking the determinant, which is +1 for rotations 


For dimension counters: a 
general 2x2 matrix has 
4 degrees of freedom, and 
the SVD rewrites them as 
two rotation angles and two 
scale factors. One more bit 
is needed to keep track of 
reflections, but that doesn't 
add a dimension. 
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and —1 for reflections, and if rotations are desired, one of the singular values can 
be negated, resulting in a rotation-scale-rotation sequence where the reflection is 
rolled in with the scale, rather than with one of the rotations. 


Example. The example used in Section 5.4.1 is in fact a shear matrix (Figure 6.12): 


1 

0 


R 2 


cr l 
0 


0 

02 


Ri 


'0.8507 

-0.5257 


1.618 

0 


0.5257 

0.8507' 

0.5257 

0.8507 


0 

0.618 


-0.8507 

0.5257 


= rotate (31.7°) scale (1.618,0.618) rotate (—58.3°). 

An immediate consequence of the existence of SVD is that all the 2D transforma¬ 
tion matrices we have seen can be made from rotation matrices and scale matrices. 
Shear matrices are a convenience, but they are not required for expressing trans¬ 
formations. 

In summary, every matrix can be decomposed via SVD into a rotation times 
a scale times another rotation. Only symmetric matrices can be decomposed via 
eigenvalue diagonalization into a rotation times a scale times the inverse-rotation, 
and such matrices are a simple scale in an arbitrary direction. The SVD of a 
symmetric matrix will yield the same triple product as eigenvalue decomposition 
via a slightly more complex algebraic manipulation. 


Paeth Decomposition of Rotations 

Another decomposition uses shears to represent non-zero rotations (Paeth, 1990). 
The following identity allows this: 


cos <f> — sin (j) 
sin (j> cos (j) 


1 

0 



1 

O' 



sin (j) 

1 



COS 4 >— 1 ~ 

sin cf) 

l 


For example, a rotation by 7t/4 (45 degrees) is (see Figure 6.16) 


,n. 

rotate ( —) 
4 


'l 1 - v 7 ^ 

' 1 O' 

i 

T —1 

T —1 

0 1 

y/2 -j 

2 - L J 

0 1 


( 6 . 2 ) 


This particular transform is useful for raster rotation because shearing is a 
very efficient raster operation for images; it introduces some jagginess, but will 
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Figure 6.16. Any 2D rotation can be accomplished by three shears in sequence. In this 
case a rotation by 45° is decomposed as shown in Equation 6.2. 


leave no holes. The key observation is that if we take a raster position (i, j ) and 
apply a horizontal shear to it, we get 


1 s 


i 


i + sj 

0 1 


3. 


j 


If we round sj to the nearest integer, this amounts to taking each row in the 
image and moving it sideways by some amount—a different amount for each 
row. Because it is the same displacement within a row, this allows us to rotate 
with no gaps in the resulting image. A similar action works for a vertical shear. 
Thus, we can implement a simple raster rotation easily. 


6.2 3D Linear Transformations 


The linear 3D transforms are an extension of the 2D transforms. For example, a 
scale along Cartesian axes is 


scaled, s^Sj) 


See 0 0 

0 s y 0 

0 0 


s 


(6.3) 
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Rotation is considerably more complicated in 3D than in 2D, because there are 
more possible axes of rotation. However, if we simply want to rotate about the 
z-axis, which will only change x- and ^-coordinates, we can use the 2D rotation 
matrix with no operation on z: 


To understand why the mi¬ 
nus sign is in the lower left 
for the y-axis rotation, think 
of the three axes in a circu¬ 
lar sequence: y after x; z 
after y; x after z. 



cost/) — 

sin 

0 

O' 


rotate-z (</>) = 

sin (j) 

cos 

0 

0 




0 

0 


1 


Similarly we can construct matrices to 

rotate about 

the 

X- 

axis and the 


'1 

0 


0 



rotate-x(^) = 

0 

COS (j) 

- 

sin 

& 

1 


0 

sin (j) 


cos 

( t ) . 




COS (j) 

0 

sin 

4> 


rotate-y(^) = 


0 

1 

0 





siiup 

0 

cos 

<t>_ 



We will discuss rotations about arbitrary axes in the next section. 

As in two dimensions, we can shear along a particular axis, for example. 


shear -x(d v ,d z ) 


1 dy d z 

0 1 0 

0 0 1 


As with 2D transforms, any 3D transformation matrix can be decomposed using 
SVD into a rotation, scale, and another rotation. Any symmetric 3D matrix has 
an eigenvalue decomposition into rotation, scale, and inverse-rotation. Finally, a 
3D rotation can be decomposed into a product of 3D shear matrices. 


6.2.1 Arbitrary 3D Rotations 

As in 2D, 3D rotations are orthogonal matrices. Geometrically, this means that 
the three rows of the matrix are the Cartesian coordinates of three mutually- 
orthogonal unit vectors as discussed in Section 2.4.5. The columns are three, 
potentially different, mutually-orthogonal unit vectors. There are an infinite num¬ 
ber of such rotation matrices. Let’s write down such a matrix: 


R 


uvw 



Vu 

Zu 

Xy 

Vv 

Zy 

Xyj 

Vw 

Zyj 
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Here, u = x u x + y u y + z u z and so on for v and w. Since the three vectors are 
orthonormal we know that 

uu = vv = ww = l, 
u ■ v = v ■ w = w ■ u = 0. 


We can infer some of the behavior of the rotation matrix by applying it to the 
vectors u, v and w. For example. 



X u 

y u 

Zu 


X u 


%u%u H - VuVu H - z u z u 


Xy 

Vv 

z v 


Vu 

= 

%v%u H - UvVu H - z v z u 


Xyj 

Vw 

Zyj 


Zu 


%w%u H - VwVu H - z w z u 


Note that those three rows of R TOW u are all dot products: 



u • u 


Y 

^'UVW U — 

V 

u 

= 

0 


w 

u 


0 


Similarly, R„, ; „,v = y, and = z. So R takes the basis uvw to the 

corresponding Cartesian axes via rotation. 

If R,„ ;u , is a rotation matrix with orthonormal rows, then R'^ ju , is also a ro¬ 
tation matrix with orthonormal columns, and in fact is the inverse of R„ TO (the 
inverse of an orthogonal matrix is always its transpose). An important point is that 
for transformation matrices, the algebraic inverse is also the geometric inverse. So 
if R„ 1C takes u to x, then R^ ro , takes x to u. The same should be true of v and 
y as we can confirm: 


R T y = 

^ uvwJ 

^ H 1 
e s 

e cs 

1 


O' 

1 

= 

1 

s> 

i_ 


Zu Zy 

Zyj 


0 


Zy 


So we can always create rotation matrices from orthonormal bases. 

If we wish to rotate about an arbitrary vector a, we can form an orthonormal 
basis with w = a, rotate that basis to the canonical basis xyz, rotate about the 
2 -axis, and then rotate the canonical basis back to the uvw basis. In matrix form, 
to rotate about the uj-axis by an angle </>: 


%U 

Xy 

Xyj 


cos </> 

— sin <f> 

O' 


%u 

Vu 

Zu 

Vu 

Vv 

Uw 


sin 4> 

COS(/) 

0 


Xy 

Vv 

Zy 

Zu 

Zy 

Zyj 


0 

0 

1 


x w 

Vw 

Zyu 


Here we have w a unit vector in the direction of a (i.e. a divided by its own 
length). But what are u and v? A method to find reasonable u and v is given in 
Section 2.4.6. 
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6. Transformation Matrices 


If we have a rotation matrix and we wish to have the rotation in axis-angle 
form, we can compute the one real eigenvalue (which will be A = 1), and the 
corresponding eigenvector is the axis of rotation. This is the one axis that is not 
changed by the rotation. 

See Chapter 17 for a comparison of the few most-used ways to represent ro¬ 
tations, besides rotation matrices. 


6.2.2 Transforming Normal Vectors 

While most 3D vectors we use represent positions (offset vectors from the origin) 
or directions, such as where light comes from, some vectors represent surface 
normals. Surface normal vectors are perpendicular to the tangent plane of a sur¬ 
face. These normals do not transform the way we would like when the underlying 
surface is transformed. For example, if the points of a surface are transformed by 
a matrix M, a vector t that is tangent to the surface and is multiplied by M will 
be tangent to the transformed surface. However, a surface normal vector n that is 
transformed by M may not be normal to the transformed surface (Figure 6.17). 

We can derive a transform matrix N which does take n to a vector perpen¬ 
dicular to the transformed surface. One way to attack this issue is to note that a 
surface normal vector and a tangent vector are perpendicular, so their dot product 
is zero, which is expressed in matrix form as 

n T t = 0. (6.4) 

If we denote the desired transformed vectors as t*f = Mt and n y = Nn, 
our goal is to find N such that = 0. We can find N by some algebraic 



Figure 6.17. When a normal vector is transformed using the same matrix that transforms 
the points on an object, the resulting vector may not be perpendicular to the surface as is 
shown here for the sheared rectangle. The tangent vector, however, does transform to a 
vector tangent to the transformed surface. 
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tricks. First, we can sneak an identity matrix into the dot product, and then take 
advantage of M _1 M = I: 

n T t = n T It = n T M _1 Mt = 0. 

Although the manipulations above don't obviously get us any where, note that we 
can add parentheses that make the above expression more obviously a dot product: 

(n T lVr ^ (Mt) = (n T M -1 ) t M = 0. 

This means that the row vector that is perpendicular to tM is the left part of the 
expression above. This expression holds for any of the tangent vectors in the 
tangent plane. Since there is only one direction in 3D (and its opposite) that 
is perpendicular to all such tangent vectors, we know that the left part of the 
expression above must be the row vector expression for riAr, i.e., it is njj, so this 
allows us to infer N: 

nl = n T M-\ 

so we can take the transpose of that to get 

njv = (n T M -1 ) T = (M -1 ) T n. (6.5) 

Therefore, we can see that the matrix which correctly transforms normal vectors 
so they remain normal is N = (M 1 ) 1 , i.e., the transpose of the inverse matrix. 
Since this matrix may change the length of n, we can multiply it by an arbitrary 
scalar and it will still produce njv with the right direction. Recall from Section 5.3 
that the inverse of a matrix is the transpose of the cofactor matrix divided by the 
determinant. Because we don’t care about the length of a normal vector, we can 
skip the division and find that for a 3 x 3 matrix. 


TO 11 

m c 12 

m c 13 

77121 

m c 22 

™h 

, m 31 

^32 

" 133 . 


This assumes the element of M in row i and column j is rn t3 . So the full expres¬ 
sion for N is 


N = 


m 22 m 33 - m 23 m 3 2 
m 13 m 3 2 - TO 12 TO 33 
TO12TO23 - m 13 m 2 2 


m 23 m 31 - m 2 im 33 
mum 33 - m 13 m 31 
m.i 3 m 2 i - mnTO 2 3 


W21W32 — 7712277131 
7711277131 - 77lll77l 3 2 
7711177122 - 77li 2 77l21 
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6. Transformation Matrices 


6.3 Translation and Affine Transformations 

We have been looking at methods to change vectors using a matrix M. In two 
dimensions, these transforms have the form, 

x' = mux + m\2y, 

y' = m 2 ix + m 22 y. 

We cannot use such transforms to move objects, only to scale and rotate them. In 
particular, the origin (0, 0) always remains fixed under a linear transformation. To 
move, or translate, an object by shifting all its points the same amount, we need 
a transform of the form, 

x' = X + x t , 

y' = y + yt- 

There is just no way to do that by multiplying (x,y) by a 2 x 2 matrix. One 
possibility for adding translation to our system of linear transformations is to 
simply associate a separate translation vector with each transformation matrix, 
letting the matrix take care of scaling and rotation and the vector take care of 
translation. This is perfectly feasible, but the bookkeeping is awkward and the 
rule for composing two transformations is not as simple and clean as with linear 
transformations. 

Instead, we can use a clever trick to get a single matrix multiplication to do 
both operations together. The idea is simple: represent the point ( x , y) by a 3D 
vector [ij/l] r , and use 3x3 matrices of the form 

mn m 12 x t 
m 2 1 m 22 y t 

_ 0 0 1 

The fixed third row serves to copy the 1 into the transformed vector, so that all 
vectors have a 1 in the last place, and the first two rows compute x' and i / as 
linear combinations of x, y, and 1: 


x' 


m n 

mi 2 

Xt 


X 


mux + m 12 y + x t 

y' 

= 

m 2 1 

m 22 

yt 


y 

= 

m 2 ix + m 22 y + y t 

l 


0 

0 

l 


l 


1 


The single matrix implements a linear transformation followed by a translation! 
This kind of transformation is called an affme transformation, and this way of 
implementing affine transformations by adding an extra dimension is called ho¬ 
mogeneous coordinates (Roberts, 1965; Riesenfeld, 1981; Penna & Patterson, 
1986). Homogeneous coordinates not only clean up the code for transformations, 
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but this scheme also makes it obvious how to compose two affine transformations: 
simply multiply the matrices. 

A problem with this new formalism arises when we need to transform vec¬ 
tors that are not supposed to be positions—they represent directions, or offsets 
between positions. Vectors that represent directions or offsets should not change 
when we translate an object. Fortunately, we can arrange for this by setting the 
third coordinate to zero: 


10 x t 


X 


X 

& 

T—1 

o 


y 

= 

y 

0 0 1 


0 


0 


If there is a scaling/rotation transformation in the upper-left 2x2 entries of the 
matrix, it will apply to the vector, but the translation still multiplies with the zero 
and is ignored. Furthermore, the zero is copied into the transformed vector, so 
direction vectors remain direction vectors after they are transformed. 

This is exactly the behavior we want for vectors, so they fit smoothly into the 
system: the extra (third) coordinate will be either 1 or 0 depending on whether we 
are encoding a position or a direction. We actually do need to store the homoge¬ 
neous coordinate so we can distinguish between locations and other vectors. For 
example, 


3 


3' 

2 

is a location and 

2 

1 


0 


is a displacement or direction. 


Later, when we do perspective viewing, we will see that it is useful to allow the 
homogeneous coordinate to take on values other than one or zero. 

Homogeneous coordinates are used nearly universally to represent transfor¬ 
mations in graphics systems. In particular, homogeneous coordinates underlie the 
design and operation of Tenderers implemented in graphics hardware. We will see 
in Chapter 7 that homogeneous coordinates also make it easy to draw scenes in 
perspective, another reason for their popularity. 

Homogeneous coordinates can be considered just a clever way to handle the 
bookkeeping for translation, but there is also a different, geometric interpretation. 
The key observation is that when we do a 3D shear based on the 2 -coordinate we 
get this transform: 


1 0 x t 


X 


X + XtZ 

0 1 Vt 


y 

= 

y + ytz 

0 0 1 


z 


z 


Note that this almost has the form we want in x and y for a 2D translation, but 
has a z hanging around that doesn’t have a meaning in 2D. Now comes the key 


This gives an explanation 
for the name “homoge¬ 
neous:” translation, rota¬ 
tion, and scaling of posi¬ 
tions and directions all fit 
into a single system. 


Homogeneous coordinates 
are also ubiquitous in com¬ 
puter vision. 


















132 


6. Transformation Matrices 


decision: we will add a coordinate z = 1 to all 2D locations. This gives us 


1 

H 

o 

T—1 


X 


X + x t 

o 1 Vt 


y 

= 

y + yt 

0 0 1 


l 


l 


By associating a (z = l)-coordinate with all 2D points, we now can encode trans¬ 
lations into matrix form. For example, to first translate in 2D by (t x , t y ) and then 
rotate by angle <f> we would use the matrix 



COS (j) 

— sin <p 

O' 


'1 

0 

x t 

M = 

sin <f> 

COS (j) 

0 


0 

1 

yt 


0 

0 

1 


0 

0 

l 


Note that the 2D rotation matrix is now 3x3 with zeros in the “translation slots.” 
With this type of formalism, which uses shears along z = 1 to encode translations, 
we can represent any number of 2D shears, 2D rotations, and 2D translations as 
one composite 3D matrix. The bottom row of that matrix will always be (0, 0,1), 
so we don't really have to store it. We just need to remember it is there when we 
multiply two matrices together. 

In 3D, the same technique works: we can add a fourth coordinate, a homoge¬ 
neous coordinate, and then we have translations: 


1 0 0 Xt 


X 


X + x t 

0 10 yt 


y 


y + yt 

T—1 

O 

o 


z 


z + z t 

0 0 0 1 


1 


l 


Again, for a direction vector, the fourth coordinate is zero and the vector is thus 
unaffected by translations. 

Example (Windowing transformations). Often in graphics we need to create a trans¬ 
form matrix that takes points in the rectangle [xi,Xh] x [yi,yh] to the rectangle 
\x[, x' h \ x \y[, y\ J. This can be accomplished with a single scale and translate in 
sequence. However, it is more intuitive to create the transform from a sequence 
of three operations (Figure 6.18): 

1. Move the point (xi,yi) to the origin. 

2. Scale the rectangle to be the same size as the target rectangle. 

3. Move the origin to point (xj, y'j). 
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Figure 6.18. To take one rectangle (window) to the other, we first shift the lower-left corner 
to the origin, then scale it to the new size, and then move the origin to the lower-left corner 
of the target rectangle. 


Remembering that the right-hand matrix is applied first, we can write 


window 


= translate (x\. y',) scale ( Xh x \ Vh Vl 

V IT nv \x h -xi > y h -vi ) 


translate (-a : h -yi) 


'1 

0 

1 


0 

1 

v'i 


_0 

0 

i 



0 0 

y'h-v'i q 
Vh-yi 

0 1 


1 0 -xi 

0 1 -yi 

0 0 1 


X 'h- X 'l Q X'i X h -x' hXl 

Xh-Xl X h -Xl 


0 

0 


y'h-v'i 

yh-yi 


0 


yiVh-y' h yi 

Vh-yi 

1 


( 6 . 6 ) 


It is perhaps not surprising to some readers that the resulting matrix has the form 
it does, but the constructive process with the three matrices leaves no doubt as to 
the correctness of the result. 

An exactly analogous construction can be used to define a 3D windowing 
transformation, which maps the box [xi,Xh\ x [yi,yh\ x [zi,Zh\ to the box 
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6. Transformation Matrices 


Wi,x' h ] x [y' v y' h ] x [z[,z' h \: 


ii 


Xh-Xl 

0 


0 


Vh-Vi 

yh-yi 


x t x h -x h xi 

Xh-Xl 

y[yh-y' h yi 

yh-yi 

z[z h -z' h zi 

Zh—Zl 


(6.7) 

□1 


It is interesting to note that if we multiply an arbitrary matrix composed of 
scales, shears, and rotations with a simple translation (translation comes second), 
we get 


'1 

0 

0 

Xt 


an 

«12 

ai3 

O' 


Oil 

O12 

Ol3 

Xt 

0 

1 

0 

yt 


021 

022 

023 

0 


021 

022 

023 

yt 

0 

0 

1 

Zt 
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«32 
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0 
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032 

033 

Zt 

0 

0 

0 

1 


0 

0 

0 

1 


0 

0 

0 
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Thus, we can look at any matrix and think of it as a scaling/rotation part and a 
translation part because the components are nicely separated from each other. 

An important class of transforms are rigid-body transforms. These are com¬ 
posed only of translations and rotations, so they have no stretching or shrinking 
of the objects. Such transforms will have a pure rotation for the a, :) above. 

6.4 Inverses of Transformation Matrices 

While we can always invert a matrix algebraically, we can use geometry if we 
know what the transform does. For example, the inverse of scaled, s y , s z ) is 
scale(l/s K , 1 /s y , 1 /s z ). The inverse of a rotation is the same rotation with the 
opposite sign on the angle. The inverse of a translation is a translation in the 
opposite direction. If we have a series of matrices M = M 1 M 2 • ■ • M„ then 
M -1 = M” 1 • • • 

Also, certain types of transformation matrices are easy to invert. We’ve al¬ 
ready mentioned scales, which are diagonal matrices; the second important ex¬ 
ample is rotations, which are orthogonal matrices. Recall (Section 5.2.4) that the 
inverse of an orthogonal matrix is its transpose. This makes it easy to invert ro¬ 
tations and rigid body transformations (see Exercise 6). Also, it’s useful to know 
that a matrix with [0 0 0 1] in the bottom row has an inverse that also has [0 0 0 1] 
in the bottom row (see Exercise 7). 

Interestingly, we can use SVD to invert a matrix as well. Since we know 
that any matrix can be decomposed into a rotation times a scale times a rotation, 
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inversion is straightforward. For example in 3D we have 


M = Riscale(<ri,c72,<73)R2, 


and from the rules above it follows easily that 

MA 1 = Rjscale(1/tTi, 1 /cr 2 , l/c^R^. 

6.5 Coordinate Transformations 

All of the previous discussion has been in terms of using transformation matrices 
to move points around. We can also think of them as simply changing the coor¬ 
dinate system in which the point is represented. For example, in Figure 6.19, we 
see two ways to visualize a movement. In different contexts, either interpretation 
may be more suitable. 

For example, a driving game may have a model of a city and a model of 
a car. If the player is presented with a view out the windshield, objects inside 
the car are always drawn in the same place on the screen, while the streets and 
buildings appear to move backward as the player drives. On each frame, we 
apply a transformation to these objects that moves them farther back than on the 
previous frame. One way to think of this operation is simply that it moves the 
buildings backward; another way to think of it is that the buildings are staying put 
but the coordinate system in which we want to draw them—which is attached to 
the car—is moving. In the second interpretation, the transformation is changing 



if 1 , 1 ). 





( 2 , 1 ). 





CM)# 




Figure 6.19. The point (2,1) has a transform “translate by (-1,0)” applied to it. On the top 
right is our mental image if we view this transformation as a physical movement, and on the 
bottom right is our mental image if we view it as a change of coordinates (a movement of the 
origin in this case). The artificial boundary is just an artifice, and the relative position of the 
axes and the point are the same in either case. 
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6. Transformation Matrices 


In 2D, of course, there are 
two basis vectors. 


In 2D, right handed means 
is counter-clockwise from 


the coordinates of the city geometry, expressing them as coordinates in the car’s 
coordinate system. Both ways will lead to exactly the same matrix that is applied 
to the geometry outside the car. 

If the game also supports an overhead view to show where the car is in the 
city, the buildings and streets need to be drawn in fixed positions while the car 
needs to move from frame to frame. The same two interpretations apply: we 
can think of the changing transformation as moving the car from its canonical 
position to its current location in the world; or we can think of the transformation 
as simply changing the coordinates of the car’s geometry, which is originally 
expressed in terms of a coordinate system attached to the car, to express them 
instead in a coordinate system fixed relative to the city. The change-of-coordinates 
interpretation makes it clear that the matrices used in these two modes (city-to-car 
coordinate change vs. car-to-city coordinate change) are inverses of one another. 

The idea of changing coordinate systems is much like the idea of type conver¬ 
sions in programming. Before we can add a floating-point number to an integer, 
we need to convert the integer to floating point or the floating-point number to an 
integer, depending on our needs, so that the types match. And before we can draw 
the city and the car together, we need to convert the city to car coordinates or the 
car to city coordinates, depending on our needs, so that the coordinates match. 

When managing multiple coordinate systems, it’s easy to get confused and 
wind up with objects in the wrong coordinates, causing them to show up in un¬ 
expected places. But with systematic thinking about transformations between 
coordinate systems, you can reliably get the transformations right. 

Geometrically, a coordinate system, or coordinate frame, consists of an origin 
and a basis—a set of three vectors. Orthonormal bases are so convenient that 
we’ll normally assume frames are orthonormal unless otherwise specified. In a 
frame with origin p and basis {u, v,w}, the coordinates ( u,v,w ) describe the 
point 

P + MU + VV + WW. 

When we store these vectors in the computer, they need to be represented in 
terms of some coordinate system. To get things started, we have to designate 
some canonical coordinate system, often called “global” or “world” coordinates, 
which is used to describe all other systems. In the city example, we might adopt 
the street grid and use the convention that the x-axis points along Main Street, 
the y-axis points up, and the 2 -axis points along Central Avenue. Then when we 
write the origin and basis of the car frame in terms of these coordinates it is clear 
what we mean. 

In 2D our convention is is to use the point o for the origin, and x and y for 
the right-handed orthonormal basis vectors x and y (Figure 6.20). 
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Figure 6.20. The point p can be represented in terms of either coordinate system. 


Another coordinate system might have an origin e and right-handed orthonor¬ 
mal basis vectors u and v. Note that typically the canonical data o, x, and y are 
never stored explicitly. They are the frame-of-reference for all other coordinate 
systems. In that coordinate system, we often write down the location of p as an 
ordered pair, which is shorthand for a full vector expression: 

P = (x P , y p ) = o + x p x + y p y. 

For example, in Figure 6.20, (x p ,y p ) = (2.5, 0.9). Note that the pair (x p ,y p ) 
implicitly assumes the origin o. Similarly, we can express p in terms of another 
equation: 

p = (u p , v p ) = e + u p u + v p v. 

In Figure 6.20, this has (u p , v p ) = (0.5, —0.7). Again, the origin e is left as an 
implicit part of the coordinate system associated with u and v. 

We can express this same relationship using matrix machinery, like this: 


x p 


1 0 x e 


Xu Xy 0 


Up 


-Eu %e 


u p 

y P 

= 

0 1 y e 


y u Vv o 


v p 

= 

Vu Vv Ue 


v p 

1 


0 0 1 


0 0 1 


1 


0 0 1 


1 


Note that this assumes we have the point e and vectors u and v stored in canonical 
coordinates; the (x. y)-coordinate system is the first among equals. In terms of the 
basic types of transformations we’ve discussed in this chapter, this is a rotation 
(involving u and v) followed by a translation (involving e). Looking at the matrix 
for the rotation and translation together, you can see it’s very easy to write down: 
we just put u, v, and e into the columns of a matrix, with the usual [0 0 1] in the 
third row. To make this even clearer we can write the matrix like this: 


P xy 


U V 

0 0 


e 

1 


P uv * 


We call this matrix th e frame-to-canonical matrix for the ( u , v) frame. It takes 
points expressed in the (u, v ) frame and converts them to the same points ex¬ 
pressed in the canonical frame. 


The name “frame-to- 
canonical” is based on 
thinking about changing 
the coordinates of a vector 
from one system to an¬ 
other. Thinking in terms of 
moving vectors around, the 
frame-to-canonical matrix 
maps the canonical frame 
to the (u,v) frame. 
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6. Transformation Matrices 


To go in the other direction we have 


Up 


%IL Vu 0 


1 0 —x e 


Xp 

Vp 

= 

Vv 0 


0 1 -2/e 


Vp 

1 


. 0 0 !. 


0 0 1 


1 


This is a translation followed by a rotation; they are the inverses of the rotation and 
translation we used to build the frame-to-canonical matrix, and when multiplied 
together they produce the inverse of the frame-to-canonical matrix, which is (not 
surprisingly) called the canonical-to-frame matrix: 


P uv 


U V 

0 0 



P xy' 


The canonical-to-frame matrix takes points expressed in the canonical frame and 
converts them to the same points expressed in the ( u,v ) frame. We have written 
this matrix as the inverse of the frame-to-canonical matrix because it can’t im¬ 
mediately be written down using the canonical coordinates of e, u, and v. But 
remember that all coordinate systems are equivalent; it’s only our convention of 
storing vectors in terms of x- and //-coordinates that creates this seeming asym¬ 
metry. The canonical-to-frame matrix can be expressed simply in terms of the (u, 
v) coordinates of o, x, and y: 


P uv 


x uv y uv 
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Ouu 

1 


P xy ■ 


All these ideas work strictly analogously in 3D, where we have 
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Frequently Asked Questions 

• Can’t I just hardcode transforms rather than use the matrix formalisms? 

Yes, but in practice it is harder to derive, harder to debug, and not any more ef¬ 
ficient. Also, all current graphics APIs use this matrix formalism so it must be 
understood even to use graphics libraries. 

• The bottom row of the matrix is always (0,0,0,1). Do I have to store it? 

You do not have to store it unless you include perspective transforms (Chapter 7). 


Notes 

The derivation of the transformation properties of normals is based on Proper¬ 
ties of Surface Normal Transformations (Turkowski, 1990). In many treatments 
through the mid-1990s, vectors were represented as row vectors and premulti¬ 
plied, e.g., b = aM. In our notation this would be b r = a T M T . If you want to 
find a rotation matrix R that takes one vector a to a vector b of the same length: 
b = Ra you could use two rotations constructed from orthonormal bases. A more 
efficient method is given in Efficiently Building a Matrix to Rotate One Vector to 
Another (Akenine-Moller et ah, 2008). 


Exercises 

1. Write down the 4 x 4 3D matrix to move by (x m , 2/m, z m ). 

2. Write down the 4 x 4 3D matrix to rotate by an angle 6 about the y- axis. 

3. Write down the 4x4 3D matrix to scale an object by 50% in all directions. 

4. Write the 2D rotation matrix that rotates by 90 degrees clockwise. 

5. Write the matrix from Exercise 4 as a product of three shear matrices. 

6. Find the inverse of the rigid body transformation: 

R t 

0 0 0 1 


where R is a 3 x 3 rotation matrix and t is a 3-vector. 
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6. Transformation Matrices 


7. Show that the inverse of the matrix for an affine transformation (one that 
has all zeros in the bottom row except for a one in the lower right entry) 
also has the same form. 

8. Describe in words what this 2D transform matrix does: 

'0 -1 1 ' 

10 1 . 

0 0 1 

9. Write down the 3x3 matrix that rotates a 2D point by angle 9 about a point 

P = {x P , Up)- 

10. Write down the 4x4 rotation matrix that takes the orthonormal 3D vectors 

u = (x u ,y u , z u ), v = (x v , y v , z v ), and w = (x w ,y w , z w ), to orthonormal 
3D vectors a = (x a ,y a ,z a ), b = (x b ,y b ,z b ), and c = (x c ,y c ,z c ). So 

M u = a, Mv = b, and Mw = c. 

11. What is the inverse matrix for the answer to the previous problem? 
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Viewing 


In the previous chapter we saw how to use matrix transformations as a tool for 
arranging geometric objects in 2D or 3D space. A second important use of geo¬ 
metric transformations is in moving objects between their 3D locations and their 
positions in a 2D view of the 3D world. This 3D to 2D mapping is called a viewing 
transformation, and it plays an important role in object-order rendering, in which 
we need to rapidly find the image-space location of each object in the scene. 

When we studied ray tracing in Chapter 4, we covered the different types of 
perspective and orthographic views and how to generate viewing rays according 
to any given view. This chapter is about the inverse of that process. Here we 
explain how to use matrix transformations to express any parallel or perspective 
view. The transformations in this chapter project 3D points in the scene (world 
space) to 2D points in the image (image space), and they will project any point on 
a given pixel’s viewing ray back to that pixel’s position in image space. 

If you have not looked at it recently, it is advisable to review the discussion of 
perspective and ray generation in Chapter 4 before reading this chapter. 

By itself, the ability to project points from the world to the image is only 
good for producing wireframe renderings—renderings in which only the edges 
of objects are drawn, and closer surfaces do not occlude more distant surfaces 
(Figure 7.1). Just as a ray tracer needs to find the closest surface intersection 
along each viewing ray, an object-order Tenderer displaying solid-looking objects 
has to work out which of the (possibly many) surfaces drawn at any given point 
on the screen is closest and display only that one. In this chapter, we assume we 
are drawing a model consisting only of 3D line segments that are specified by 
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7. Viewing 



Figure 7.1. Left: wireframe cube in orthographic projection. Middle: wireframe cube in 
perspective projection. Right: perspective projection with hidden lines removed. 


the (x, y , z) coordinates of their two end points. Later chapters will discuss the 
machinery needed to produce renderings of solid surfaces. 


7.1 Viewing Transformations 


Some APIs use “viewing 
transformation” for just the 
piece of our viewing trans¬ 
formation that we call the 
camera transformation. 


The viewing transformation has the job of mapping 3D locations, represented 
as (x, y, z) coordinates in the canonical coordinate system, to coordinates in the 
image, expressed in units of pixels. It is a complicated beast that depends on 
many different things, including the camera position and orientation, the type 
of projection, the field of view, and the resolution of the image. As with all 
complicated transformations it is best approached by breaking it up in to a product 
of several simpler transformations. Most graphics systems do this by using a 
sequence of three transformations: 


• A camera transformation or eye transformation, which is a rigid body trans¬ 
formation that places the camera at the origin in a convenient orientation. 
It depends only on the position and orientation, or pose, of the camera. 


• A projection transformation, which projects points from camera space so 
that all visible points fall in the range — 1 to 1 in x and y. It depends only 
on the type of projection desired. 

• A viewport transformation or windowing transformation, which maps this 
unit image rectangle to the desired rectangle in pixel coordinates. It de¬ 
pends only on the size and position of the output image. 

To make it easy to describe the stages of the process (Figure 7.2), we give names 
to the coordinate systems that are the inputs and output of these transformations. 
The camera transformation converts points in canonical coordinates (or world 
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Figure 7.2. The sequence of spaces and transformations that gets objects from their 
original coordinates into screen space. 


space) to camera coordinates or places them in camera space. The projection 
transformation moves points from camera space to the canonical view volume. 
Finally, the viewport transformation maps the canonical view volume to screen 
space. 

Each of these transformations is individually quite simple. We’ll discuss them 
in detail for the orthographic case beginning with the viewport transformation, 
then cover the changes required to support perspective projection. 


Other names: camera 
space is also “eye space” 
and the camera transfor¬ 
mation is sometimes the 
“viewing transformation;” 
the canonical view volume 
is also “clip space” or 
“normalized device coor¬ 
dinates;” screen space is 
also “pixel coordinates.” 


7.1.1 The Viewport Transformation 


We begin with a problem whose solution will be reused for any viewing condition. 
We assume that the geometry we want to view is in the canonical view volume, 
and we wish to view it with an orthographic camera looking in the — z direction. 
The canonical view volume is the cube containing all 3D points whose Cartesian 
coordinates are between —1 and +1—that is, (x, y , z) £ [—1, l] 3 (Figure 7.3). 
We project x = —1 to the left side of the screen, x = +1 to the right side of the 
screen, y — —1 to the bottom of the screen, and y — +1 to the top of the screen. 

Recall the conventions for pixel coordinates from Chapter 3; each pixel “owns” 
a unit square centered at integer coordinates; the image boundaries have a half¬ 
unit overshoot from the pixel centers; and the smallest pixel center coordinates 


The word “canonical” crops 
up again—it means some¬ 
thing arbitrarily chosen for 
convenience. For instance, 
the unit circle could be 
called the “canonical circle.” 
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Mapping a square to a po¬ 
tentially non-square rectan¬ 
gle is not a problem; xand 
y just end up with differ¬ 
ent scale factors going from 
canonical to pixel coordi¬ 
nates. 


are (0, 0). If we are drawing into an image (or window on the screen) that has 
n x by n y pixels, we need to map the square [—1, l] 2 to the rectangle [—0.5, n x — 
0.5] x [-0.5, n y - 0.5], 

For now we will assume that all line segments to be drawn are completely 
inside the canonical view volume. Later we will relax that assumption when we 
discuss clipping. 

Since the viewport transformation maps one axis-aligned rectangle to another, 
it is a case of the windowing transform given by Equation (6.6): 



Figure 7.3. The canonical 
view volume is a cube with 
side of length two centered 
at the origin. 


screen 


rn x n n x — l-] 

2 u 2 


•^-canonical 

2/screen 

= 

p) n y n y — 1 

u 2 2 


2/canonical 

l 


. 0 0 1 


l 


Note that this matrix ignores the z-coordinate of the points in the canonical view 
volume, because a point’s distance along the projection direction doesn’t affect 
where that point projects in the image. But before we officially call this the view¬ 
port matrix , we add a row and column to carry along the z-coordinate without 
changing it. We don’t need it in this chapter, but eventually we will need the z 
values because they can be used to make closer surfaces hide more distant surfaces 
(see Section 8.2.3). 


M vp = 


2 

0 

0 

0 


0 0 

n y n n y 

2 U 2 

0 1 0 

0 0 1 


(7.2) 


7.1.2 The Orthographic Projection Transformation 

Of course, we usually want to render geometry in some region of space other than 
the canonical view volume. Our first step in generalizing the view will keep the 
view direction and orientation fixed looking along — z with +y up,but will allow 
arbitrary rectangles to be viewed. Rather than replacing the viewport matrix, we’ll 
augment it by multiplying it with another matrix on the right. 

Under these constraints, the view volume is an axis-aligned box, and we’ll 
name the coordinates of its sides so that the view volume is [l,r] x [b,t] x [/, n] 
shown in Figure 7.4. We call this box the orthographic view volume and refer to 
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the bounding planes as follows: 


x = l = left plane, 
x = r = right plane, 
y = b = bottom plane, 
y = t = top plane, 
z = n = near plane, 
z = / = far plane. 

That vocabulary assumes a viewer who is looking along the minus 2 -axis with 
his head pointing in the y-direction. 1 This implies that n > /, which may be 
unintuitive, but if you assume the entire orthographic view volume has negative z 
values then the z = n “near” plane is closer to the viewer if and only if n > /; 
here / is a smaller number than n, i.e., a negative number of larger absolute value 
than n. 

This concept is shown in Figure 7.5. The transform from orthographic view 
volume to the canonical view volume is another windowing transform, so we can 
simply substitute the bounds of the orthographic and canonical view volumes into 
Equation (6.7) to obtain the matrix for this transformation: 

r^i o o 

= 0 1=1 j 

0 0 -2-f 

n-f 

L o o o 


_ r-\-l i 
i—l 
t-\-b 
t—b 
n+f 
n-f 
1 


(7.3) 



Figure 7.4. The ortho¬ 
graphic view volume. 


n and f appear in what 
might seem like reverse or¬ 
der because n-f , rather 
than f-n, is a positive num¬ 
ber. 



Figure 7.5. The orthographic view volume is along the negative z-axis, so f is a more 
negative number than n, thus n > f. 


'Most programmers find it intuitive to have the i-axis pointing right and the y-axis pointing up. In 
a right-handed coordinate system, this implies that we are looking in the —z direction. Some systems 
use a left-handed coordinate system for viewing so that the gaze direction is along +z. Which is best 
is a matter of taste, and this text assumes a right-handed coordinate system. A reference that argues 
for the left-handed system instead is given in the notes at the end of the chapter. 
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7. Viewing 


To draw 3D line segments in the orthographic view volume, we project them 
into screen x- and y-coordinates and ignore ^-coordinates. We do this by com¬ 
bining Equations (7.2) and (7.3). Note that in a program we multiply the matrices 
together to form one matrix and then manipulate points as follows: 


*£pixel 


X 

Z/pixel 

= (M vp M orth ) 

y 

■^canonical 

z 

1 


1 


This is a first example of 
how matrix transformation 
machinery makes graphics 
programs clean and effi¬ 
cient. 



specifies viewing as an eye 
position e, a gaze direc¬ 
tion g, and an up vector 
t. We construct a right- 
handed basis with w point¬ 
ing opposite to the gaze 
and v being in the same 
plane as g and t. 


The z-coordinate will now be in [—1,1]. We don’t take advantage of this now, but 
it will be useful when we examine z-buffer algorithms. 

The code to draw many 3D lines with endpoints a, and b, thus becomes both 
simple and efficient: 
construct M vp 
construct M ort h 
M = M vp M olth 

for each line segment (a,, b,) do 
p = Ma, 
q = Mb, 

drawline(at p , y p , x q , y q ) 


7.1.3 The Camera Transformation 

We’d like to able to change the viewpoint in 3D and look in any direction. There 
are a multitude of conventions for specifying viewer position and orientation. We 
will use the following one (see Figure 7.6): 

• the eye position e, 

• the gaze direction g, 

• the view-up vector t. 

The eye position is a location that the eye “sees from.” If you think of graphics 
as a photographic process, it is the center of the lens. The gaze direction is any 
vector in the direction that the viewer is looking. The view-up vector is any vector 
in the plane that both bisects the viewer’s head into right and left halves and points 
“to the sky” for a person standing on the ground. These vectors provide us with 
enough information to set up a coordinate system with origin e and a uvw basis, 
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Figure 7.7. For arbitrary viewing, we need to change the points to be stored in the “appro¬ 
priate” coordinate system. In this case it has origin e and offset coordinates in terms of uvw. 


using the construction of Section 2.4.7: 


w = — 


g 


Hell ’ 


t X w 

l|t X w|| ’ 


V = W X u. 


Our job would be done if all points we wished to transform were stored in co¬ 
ordinates with origin e and basis vectors u, v, and w. But as shown in Figure 7.7, 
the coordinates of the model are stored in terms of the canonical (or world) ori¬ 
gin o and the x-, y-, and z-axes. To use the machinery we have already developed, 
we just need to convert the coordinates of the line segment endpoints we wish to 
draw from xyz-coordinates into mrui-coordinates. This kind of transformation 
was discussed in Section 6.5, and the matrix that enacts this transformation is the 
canonical-to-basis matrix of the camera’s coordinate frame: 
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Alternatively, we can think of this same transformation as first moving e to the 
origin, then aligning u,v,wtox,y,z. 

To make our previously z-axis-only viewing algorithm work for cameras with 
any location and orientation, we just need to add this camera transformation 












148 


7. Viewing 


to the product of the viewport and projection transformations, so that it con¬ 
verts the incoming points from world to camera coordinates before they are pro¬ 
jected: 

construct M vp 
construct M ort h 
construct M cam 
M = M vp M olth M 

cam 

for each line segment (a*, b,) do 
p = Ma, 
q = Mb, 

drawlinefZp, y p , x q , y q ) 

Again, almost no code is needed once the matrix infrastructure is in place. 


7.2 Projective Transformations 


For the moment we will ig¬ 
nore the sign of z to keep 
the equations simpler, but it 
will return on page 152. 


We have left perspective for last because it takes a little bit of cleverness to make 
it fit into the system of vectors and matrix transformations that has served us so 
well up to now. To see what we need to do, let’s look at what the perspective 
projection transformation needs to do with points in camera space. Recall that the 
viewpoint is positioned at the origin and the camera is looking along the z-axis. 

The key property of perspective is that the size of an object on the screen is 
proportional to 1 /z for an eye at the origin looking up the negative z-axis. This 
can be expressed more precisely in an equation for the geometry in Figure 7.8: 



(7.5) 



Figure 7.8. The geometry for Equation (7.5). The viewer's eye is at e and the gaze direction 
is g (the minus z-axis). The view plane is a distance d from the eye. A point is projected 
toward e and where it intersects the view plane is where it is drawn. 
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where y is the distance of the point along the y- axis, and y s is where the point 
should be drawn on the screen. 

We would really like to use the matrix machinery we developed for ortho¬ 
graphic projection to draw perspective images; we could then just multiply an¬ 
other matrix into our composite matrix and use the algorithm we already have. 
However, this type of transformation, in which one of the coordinates of the input 
vector appears in the denominator, can’t be achieved using affine transformations. 

We can allow for division with a simple generalization of the mechanism of 
homogeneous coordinates that we have been using for affine transformations. 
We have agreed to represent the point ( x,y,z ) using the homogeneous vector 
[x y z 1] T ; the extra coordinate, w, is always equal to 1, and this is ensured by 
always using [0 0 0 1] T as the fourth row of an affine transformation matrix. 

Rather than just thinking of the 1 as an extra piece bolted on to coerce matrix 
multiplication to implement translation, we now define it to be the denominator 
of the x-, y-, and ^-coordinates: the homogeneous vector [x y z w] T represents 
the point (x/w,y/w,z/w). This makes no difference when w = l,but it allows a 
broader range of transformations to be implemented if we allow any values in the 
bottom row of a transformation matrix, causing w to take on values other than 1. 

Concretely, linear transformations allow us to compute expressions like 

x' = ax + by + cz 

and affine transformations extend this to 

x' = ax + by + cz + d. 

Treating w as the denominator further expands the possibilities, allowing us to 
compute functions like 


ax + by + cz + d 
ex + fy + gz + h' 


this could be called a “linear rational function” of x, y, and z. But there is an extra 
constraint—the denominators are the same for all coordinates of the transformed 
point: 


, a\x + biy + C\Z + d\ 

ex + fy + gz + h 
, _ a 2 x + b 2 y + c 2 z + d 2 
ex+fy + gz + h 
. a 3 x + b 3 y + c 3 z + d 3 


z 


ex+fy + gz + h 
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unit 

square 



Figure 7.9. A projec¬ 
tive transformation maps a 
square to a quadrilateral, 
preserving straight lines but 
not parallel lines. 


Expressed as a matrix transformation, 
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a\ bi Ci di 


X 
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a 2 62 C 2 d 2 
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<23 ^3 C3 d$ 
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1 

D 

_1 
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and 

(x',y',z') = ( x/w,y/w,z/w). 


A transformation like this is known as a projective transformation or a 
homography. 


Example. The matrix 


M = 


0 -1 
3 0 


0 2 

“ 3 


represents a 2D projective transformation that transforms the unit square ([0,1] x 
[0,1]) to the quadrilateral shown in Figure 7.9. 

For instance, the lower-right corner of the square at (1,0) is represented by 
the homogeneous vector [1 0 1 ] T and transforms as follows: 


to 

0 

1 

>— 1 [ 


' 1 ' 


T 

0 3 0 


0 

= 

0 

0 2 1 

L 3 3 J 


1 


1 

. 3 . 


which represents the point (1/^,0/1), or (3, 0). Note that if we use the matrix 


3M = 


6 0 
0 9 
0 2 


-3 

0 

1 


instead, the result is [3 0 1] T , which also represents (3,0). In fact, any scalar 
multiple cM is equivalent: the numerator and denominator are both scaled by c, 
which does not change the result. 


There is a more elegant way of expressing the same idea, which avoids treating 
the ui-coordinate specially. In this view a 3D projective transformation is simply 
a 4D linear transformation, with the extra stipulation that all scalar multiples of a 
vector refer to the same point: 

x ~ ax for all a ^ 0 . 


The symbol ~ is read as “is equivalent to” and means that the two homogeneous 
vectors both describe the same point in space. 
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Figure 7.10. The point x = 1.5 is represented by any point on the line x = 1.5 h, such 
as points at the hollow circles. However, before we interpret xas a conventional Cartesian 
coordinate, we first divide by h to get (x,h) = (1.5,1) as shown by the black point. 


Example. In ID homogeneous coordinates, in which we use 2-vectors to repre¬ 
sent points on the real line, we could represent the point (1.5) using the homoge¬ 
neous vector [1.5 1] T , or any other point on the line x = 1.5 h in homogeneous 
space. (See Figure 7.10.) 

In 2D homogeneous coordinates, in which we use 3-vectors to represent points 
in the plane, we could represent the point (-1,-0.5) using the homogeneous 
vector [—2; —1; 2] T , or any other point on the line x = a[— 1 — 0.5 1] r . Any 
homogeneous vector on the line can be mapped to the line’s intersection with the 
plane w = 1 to obtain its Cartesian coordinates. (See Figure 7.11.) 

It’s fine to transform homogeneous vectors as many times as needed, with¬ 
out worrying about the value of the w-coordinate—in fact, it is fine if the in¬ 
coordinate is zero at some intermediate phase. It is only when we want the ordi¬ 
nary Cartesian coordinates of a point that we need to normalize to an equivalent 
point that has w = 1, which amounts to dividing all the coordinates by w. Once 
we’ve done this we are allowed to read off the ( x , y, ^-coordinates from the first 
three components of the homogeneous vector. 


7.3 Perspective Projection 


(- 2 ,- 1 , 2 ) 



Figure 7.11. A point in 
homogeneous coordinates 
is equivalent to any other 
point on the line through 
it and the origin, and nor¬ 
malizing the point amounts 
to intersecting this line with 
the plane w= 1. 


The mechanism of projective transformations makes it simple to implement the 
division by z required to implement perspective. In the 2D example shown in Fig¬ 
ure 7.8, we can implement the perspective projection with a matrix transformation 
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Remember, n < 0. 


More on this later. 


as follows: 


Vs 

1 


d 0 0 

0 10 


y 

z 

1 


This transforms the 2D homogeneous vector [y\z\ 1] T to the ID homogeneous 
vector [dy z] T , which represents the ID point ( dy/z ) (because it is equivalent to 
the ID homogeneous vector [dy/z 1] T . This matches Equation (7.5). 

For the “official” perspective projection matrix in 3D, we’ll adopt our usual 
convention of a camera at the origin facing in the —2 direction, so the distance 
of the point (x, y. z) is —z. As with orthographic projection, we also adopt the 
notion of near and far planes that limit the range of distances to be seen. In this 
context, we will use the near plane as the projection plane, so the image plane 
distance is —n. 

The desired mapping is then y s = ( n/z)y , and similarly for x. This transfor¬ 
mation can be implemented by the perspective matrix: 


~n 0 0 0 

0 n 0 0 

0 0 n + f —fn 

0 0 1 0 


The first, second, and fourth rows simply implement the perspective equation. 
The third row, as in the orthographic and viewport matrices, is designed to bring 
the ^-coordinate “along for the ride” so that we can use it later for hidden surface 
removal. In the perspective projection, though, the addition of a non-constant 
denominator prevents us from actually preserving the value of z —it’s actually 
impossible to keep z from changing while getting x and y to do what we need 
them to do. Instead we’ve opted to keep z unchanged for points on the near or far 
planes. 

There are many matrices that could function as perspective matrices, and all 
of them non-linearly distort the z-coordinate. This specific matrix has the nice 
properties shown in Figures 7.12 and 7.13; it leaves points on the (z = n)- 
plane entirely alone, and it leaves points on the (z = /)-plane while “squishing” 
them in x and y by the appropriate amount. The effect of the matrix on a point 
(x,y,z) is 
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Figure 7.12. The perspective projection leaves points on the z = n plane unchanged and 
maps the large z = f rectangle at the back of the perspective volume to the small z = f 
rectangle at the back of the orthographic volume. 



Figure 7.13. The perspective projection maps any line through the origin/eye to a line 
parallel to the z-axis and without moving the point on the line at z = n. 
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This matrix is not literally 
the inverse of the matrix 
P, but the transformation 
it describes is the inverse 
of the transformation de¬ 
scribed by P. 


As you can see, x and y are scaled and, more importantly, divided by z. Because 
both n and z (inside the view volume) are negative, there are no “flips” in x 
and y. Although it is not obvious (see the exercise at the end of the chapter), 
the transform also preserves the relative order of 0 values between z = n and 
z = /, allowing us to do depth ordering after this matrix is applied. This will be 
important later when we do hidden surface elimination. 

Sometimes we will want to take the inverse of P, for example to bring a screen 
coordinate plus z back to the original space, as we might want to do for picking. 
The inverse is 


p 1 


n 

0 0 
0 0 


0 0 ' 

0 0 

0 1 

1 n+f 

fn fn . 


Since multiplying a homogeneous vector by a scalar does not change its meaning, 
the same is true of matrices that operate on homogeneous vectors. So we can 
write the inverse matrix in a prettier form by multiplying through by nf: 


P 1 


/ 0 0 0 

0/0 0 

0 0 0 fn 

0 0 -1 n + f 


Taken in the context of the orthographic projection matrix M ort h in Equa¬ 
tion (7.3), the perspective matrix simply maps the perspective view volume (which 
is shaped like a slice, or frustum, of a pyramid) to the orthographic view volume 
(which is an axis-aligned box). The beauty of the perspective matrix is, that once 
we apply it, we can use an orthographic transform to get to the canonical view 
volume. Thus, all of the orthographic machinery applies, and all that we have 
added is one matrix and the division by w. It is also heartening that we are not 
“wasting” the bottom row of our four by four matrices! 

Concatenating P with M ort h results in the perspective projection matrix, 


M per = M orth P. 


One issue, however, is: How are I,r,b,t determined for perspective? They 
identify the “window” through which we look. Since the perspective matrix does 
not change the values of x and y on the (z = n)-plane, we can specify (l, r, b, t) 
on that plane. 

To integrate the perspective matrix into our orthographic infrastructure, we 
simply replace M ort h with M per , which inserts the perspective matrix P after the 
camera matrix M cam has been applied but before the orthographic projection. So 
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the full set of matrices for perspective viewing is 

M = M vp M orth PM 

cam* 

The resulting algorithm is: 


compute M vp 
compute M per 
compute M cam 
M = M vp M per M 

cam 

for each line segment (a,, b,) do 
p = Ma, 
q = Mb, 

drawline(xp/w;p, y p /w p , x q /w qi y q /w q ) 


Note that the only change other than the additional matrix is the divide by the 
homogeneous coordinate w. 

Multiplied out, the matrix M per looks like this: 


- 2 n r| l+r 

r—l U l-r 


o 1 


Mpe r = 


2 n b-\-t 
t—b b—t 


o 


o 


0 


f+n 2fn 
n—f f—n 


0 0 1 0 J 


This or similar matrices often appear in documentation, and they are less mysteri¬ 
ous when one realizes that they are usually the product of a few simple matrices. 


Example. Many APIs such as OpenGL (Shreiner et al., 2004) use the same canon¬ 
ical view volume as presented here. They also usually have the user specify the 
absolute values of n and /. The projection matrix for OpenGL is 


~ 2 M n r+l 

i—l r—l 


o 


-M-OpenGL 


2]n| t+b 

t—b t—b 


0 


0 0 


H+l/l 21/IM 
M-l/l M-l/l 


0 0-1 0 


Other APIs set n and / to 0 and 1, respectively. Blinn (J. Blinn, 1996) recom¬ 
mends making the canonical view volume [0, l] 3 for efficiency. All such decisions 
will change the the projection matrix slightly. 
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7.4 Some Properties of the Perspective Transform 

An important property of the perspective transform is that it takes lines to lines 
and planes to planes. In addition, it takes line segments in the view volume to line 
segments in the canonical volume. To see this, consider the line segment 

q + t(Q — q). 

When transformed by a 4 x 4 matrix M, it is a point with possibly varying homo¬ 
geneous coordinate: 


Mq + t(MQ - Mq) = r + f(R - r). 


The homogenized 3D line segment is 


r + f (R — r) 
w r + t(Wfl — ui r ) 

If Equation (7.6) can be rewritten in a form 


— ' /(*) ( — 


w r 


wr 



(7.6) 


(7.7) 


then all the homogenized points lie on a 3D line. Brute force manipulation of 
Equation (7.6) yields such a form with 


m 


w R t 

W r + t(vjR — vu r ) 


(7.8) 


It also turns out that the line segments do map to line segments preserving the 
ordering of the points (Exercise 8), i.e., they do not get reordered or “torn.” 

A byproduct of the transform taking line segments to line segments is that 
it takes the edges and vertices of a triangle to the edges and vertices of another 
triangle. Thus, it takes triangles to triangles and planes to planes. 


7.5 Field-of-View 

While we can specify any window using the (7, r, b 1 1) and n values, sometimes 
we would like to have a simpler system where we look through the center of the 
window. This implies the constraint that 

l = -r, 
b = —t. 
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Figure 7.14. The field-of-view 9 is the angle from the bottom of the screen to the top of the 
screen as measured from the eye. 

If we also add the constraint that the pixels are square, i.e., there is no distortion 
of shape in the image, then the ratio of r to t must be the same as the ratio of the 
number of horizontal pixels to the number of vertical pixels: 

_ r 
n y t 

Once n x and n y are specified, this leaves only one degree of freedom. That is 
often set using the field-of-view shown as 9 in Figure 7.14. This is sometimes 
called the vertical field-of-view to distinguish it from the angle between left and 
right sides or from the angle between diagonal corners. From the figure we can 
see that 

9 t 
tan — = -—r. 

2 \n\ 

If n and 9 are specified, then we can derive t, and use code for the more general 
viewing system. In some systems, the value of n is hard-coded to some reasonable 
value, and thus we have one fewer degree of freedom. 


Frequently Asked Questions 

• Is orthographic projection ever useful in practice? 

It is useful in applications where relative length judgements are important. It can 
also yield simplifications where perspective would be too expensive as occurs in 
some medical visualization applications. 

• The tessellated spheres I draw in perspective look like ovals. Is this a 
bug? 
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No. It is correct behavior. If you place your eye in the same relative position to 
the screen as the virtual viewer has with respect to the viewport, then these ovals 
will look like circles because they themselves are viewed at an angle. 

• Does the perspective matrix take negative z values to positive z values 
with a reversed ordering? Doesn’t that cause trouble? 

Yes. The equation for transformed z is 



So z = +e is transformed to z' — — oo and z = — e is transformed to z = oo. 
So any line segments that span z = 0 will be “torn” although all points will be 
projected to an appropriate screen location. This tearing is not relevant when all 
objects are contained in the viewing volume. This is usually assured by clipping 
to the view volume. However, clipping itself is made more complicated by the 
tearing phenomenon as is discussed in Chapter 8. 


• The perspective matrix changes the value of the homogeneous coordi¬ 
nate. Doesn’t that make the move and scale transformations no longer 
work properly? 

Applying a translation to a homogeneous point we have 


'1 

0 

0 

tx 


hx 


hx + ht x 


X + t x 

0 

0 

1 

0 

0 

1 

ty 

tz 


hy 

hz 

= 

hy + ht y 
hz + ht z 

homogenize 

-* 

y + ty 
z + t z 

0 

0 

0 

1 


_h 


h 


1 


Similar effects are true for other transforms (see Exercise 5). 


Notes 

Most of the discussion of viewing matrices is based on information in Real-Time 
Rendering (Akenine-Molleret al., 2008), the OpenGLProgramming Guide (Shreiner 
et al., 2004), Computer Graphics (Hearn & Baker, 1986), and 3D Game Engine 
Design (Eberly, 2000). 
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Exercises 

1. Construct the viewport matrix required for a system in which pixel coordi¬ 
nates count down from the top of the image, rather than up from the bottom. 

2. Multiply the viewport and orthographic projection matrices, and show that 
the result can also be obtained by a single application of Equation (6.7). 

3. Derive the third row of Equation (7.3) from the constraint that z is preserved 
for points on the near and far planes. 

4. Show algebraically that the perspective matrix preserves order of z values 
within the view volume. 

5. For a 4 x 4 matrix whose top three rows are arbitrary and whose bottom row 
is (0, 0,0,1), show that the points (x, y , z , 1) and (hx, hy, hz , h) transform 
to the same point after homogenization. 

6. Verify that the form of M~ 1 given in the text is correct. 

7. Verify that the full perspective to canonical matrix M pro j ect j on takes (r, t, n) 
to (1,1,1). 

8. Write down a perspective matrix for n = 1, / = 2. 

9. For the point p = (x, y, z, 1), what are the homogenized and unhomoge¬ 
nized result for that point transformed by the perspective matrix in Exer¬ 
cise 6? 

10. For the eye position e = (0, 1, 0), a gaze vector g = (0, —1, 0), and a view- 
up vector t = (1,1,0), what is the resulting orthonormal uvw basis used 
for coordinate rotations? 

11. Show, that for a perspective transform, line segments that start in the view 
volume do map to line segments in the canonical volume after homogeniza¬ 
tion. Further, show that the relative ordering of points on the two segments 
is the same. Hint: Show that the /(f) in Equation (7.8) has the properties 
/(0) = 0, /(1) = 1, the derivative of / is positive for all t £ [0,1], and the 
homogeneous coordinate does not change sign. 
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The Graphics Pipeline 


The previous several chapters have established the mathematical scaffolding we 
need to look at the second major approach to rendering: drawing objects one by 
one onto the screen, or object-order rendering. Unlike in ray tracing, where we 
consider each pixel in turn and find the objects that influence its color, well now 
instead consider each geometric object in turn and find the pixels that it could have 
an effect on. The process of finding all the pixels in an image that are occupied by 
a geometric primitive is called rasterization, so object-order rendering can also 
be called rendering by rasterization. The sequence of operations that is required, 
starting with objects and ending by updating pixels in the image, is known as the 
graphics pipeline. 

Object-order rendering has enjoyed great success because of its efficiency. 
For large scenes, management of data access patterns is crucial to performance, 
and making a single pass over the scene visiting each bit of geometry once has 
significant advantages over repeatedly searching the scene to retrieve the objects 
required to shade each pixel. 

The title of this chapter suggests that there is only one way to do object- 
order rendering. Of course this isn’t true—two quite different examples of graph¬ 
ics pipelines with very different goals are the hardware pipelines used to sup¬ 
port interactive rendering via APIs like OpenGL and Direct3D and the software 
pipelines used in film production, supporting APIs like RenderMan. Hardware 
pipelines must run fast enough to react in real time for games, visualizations, 
and user interfaces. Production pipelines must render the highest quality anima¬ 
tion and visual effects possible and scale to enormous scenes, but may take much 


Any graphics system has 
one or more types of “prim¬ 
itive object” that it can han¬ 
dle directly, and more com¬ 
plex objects are converted 
into these “primitives.” Tri¬ 
angles are the most often 
used primitive. 

Rasterization-based sys¬ 
tems are also called 
scanline renderers. 
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APPLICATION 

COMMAND STREAM 
VERTEX PROCESSING 
TRANSFORMED GEOMETRY 
RASTERIZATION 



FRAGMENTS 


FRAMEBUFFER IMAGE 


DISPLAY 

Figure 8.1 . The stages of 
a graphics pipeline. 


more time to do so. Despite the different design decisions resulting from these 
divergent goals, a remarkable amount is shared among most, if not all, pipelines, 
and this chapter attempts to focus on these common fundamentals, erring on the 
side of following the hardware pipelines more closely. 

The work that needs to be done in object-order rendering can be organized 
into the task of rasterization itself, the operations that are done to geometry be¬ 
fore rasterization, and the operations that are done to pixels after rasterization. 
The most common geometric operation is applying matrix transformations, as 
discussed in the previous two chapters, to map the points that define the geometry 
from object space to screen space, so that the input to the rasterizer is expressed 
in pixel coordinates, or screen space. The most common pixelwise operation is 
hidden surface removal which arranges for surfaces closer to the viewer to appear 
in front of surfaces farther from the viewer. Many other operations also can be in¬ 
cluded at each stage, thereby achieving a wide range of different rendering effects 
using the same general process. 

For the purposes of this chapter well discuss the graphics pipeline in terms of 
four stages (Figure 8.1). Geometric objects are fed into the pipeline from an inter¬ 
active application or from a scene description file, and they are always described 
by sets of vertices. The vertices are operated on in the vertex-processing stage, 
then the primitives using those vertices are sent to the rasterization stage. The 
rasterizer breaks each primitive into a number of fragments , one for each pixel 
covered by the primitive. The fragments are processed in the fragment processing 
stage, and then the various fragments corresponding to each pixel are combined 
in the fragment blending stage. 

We’ll begin by discussing rasterization, then illustrate the purpose of the geo¬ 
metric and pixel-wise stages by a series of examples. 


8.1 Rasterization 

Rasterization is the central operation in object-order graphics, and the rasterizer 
is central to any graphics pipeline. For each primitive that comes in, the rasterizer 
has two jobs: it enumerates the pixels that are covered by the primitive and it 
interpolates values, called attributes, across the primitive—the purpose for these 
attributes will be clear with later examples. The output of the rasterizer is a set of 
fragments, one for each pixel covered by the primitive. Each fragment “lives” at 
a particular pixel and carries its own set of attribute values. 

In this chapter, we will present rasterization with a view toward using it to 
render three-dimensional scenes. The same rasterization methods are used to draw 










8.1. Rasterization 


163 


lines and shapes in 2D as well — although it is becoming more and more common 
to use the 3D graphics system “under the covers” to do all 2D drawing. 


8.1.1 Line Drawing 


Most graphics packages contain a line drawing command that takes two endpoints 
in screen coodinates (see Figure 3.10) and draws a line between them. For exam¬ 
ple, the call for endpoints (1,1) and (3,2) would turn on pixels (1,1) and (3,2) and 
fill in one pixel between them. For general screen coordinate endpoints (xq. y () ) 
and (xi, y \), the routine should draw some “reasonable” set of pixels that approx¬ 
imate a line between them. Drawing such lines is based on line equations, and we 
have two types of equations to choose from: implicit and parametric. This section 
describes the approach using implicit lines. 


Even though we often use 
integer-valued endpoints 
for examples, it's impor¬ 
tant to properly support 
arbitrary endpoints. 


Line Drawing Using Implicit Line Equations 

The most common way to draw lines using implicit equations is the midpoint al¬ 
gorithm (Pitteway (1967); van Aken and Novak (1985)). The midpoint algorithm 
ends up drawing the same lines as the Bresenham algorithm (Bresenham, 1965) 
but it is somewhat more straightforward. 

The first thing to do is find the implicit equation for the line as discussed in 
Section 2.5.2: 


f{x, y) = {yo - yi)x + (xi - x 0 )y + x 0 yi - Xiyo = 0. (8.1) 

We assume that Xq < X\. If that is not true, we swap the points so that it is true. 
The slope to of the line is given by 

yi - yo 

TO = -. 

Xl ~ X 0 

The following discussion assumes to £ (0,1]. Analogous discussions can be 
derived for m £ (—oo, —1], to £ (—1, 0], and to £ (1, oo). The four cases cover 
all possibilities. 

For the case m £ (0,1], there is more “run” than “rise,” i.e., the line is moving 
faster in x than in y. If we have an API where the y-axis points downwards, 
we might have a concern about whether this makes the process harder, but, in 
fact, we can ignore that detail. We can ignore the geometric notions of “up” 
and “down,” because the algebra is exactly the same for the two cases. Cautious 
readers can confirm that the resulting algorithm works for the y-axis downwards 
case. The key assumption of the midpoint algorithm is that we draw the thinnest 
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Figure 8.2. Three 

“reasonable” lines that go 
seven pixels horizontally 
and three pixels vertically. 



Figure 8.3. Top: the line 
goes above the midpoint so 
the top pixel is drawn. Bot¬ 
tom: the line goes below 
the midpoint so the bottom 
pixel is drawn. 


line possible that has no gaps. A diagonal connection between two pixels is not 
considered a gap. 

As the line progresses from the left endpoint to the right, there are only two 
possibilities: draw a pixel at the same height as the pixel drawn to its left, or draw 
a pixel one higher. There will always be exactly one pixel in each column of pixels 
between the endpoints. Zero would imply a gap, and two would be too thick a line. 
There may be two pixels in the same row for the case we are considering; the line 
is more horizontal than vertical so sometimes it will go right, and sometimes up. 
This concept is shown in Figure 8.2, where three “reasonable” lines are shown, 
each advancing more in the horizontal direction than in the vertical direction. 

The midpoint algorithm for to € (0,1] first establishes the leftmost pixel and 
the column number (x-value) of the rightmost pixel and then loops horizontally 
establishing the row (y-value) of each pixel. The basic form of the algorithm is: 

V = 2/o 

for x = xo to x\ do 
draw(x, y) 

if (some condition) then 

y = y +1 

Note that x and y are integers. In words this says, “keep drawing pixels from left 
to right and sometimes move upwards in the y-direction while doing so.” The key 
is to establish efficient ways to make the decision in the if statement. 

An effective way to make the choice is to look at the midpoint of the line 
between the two potential pixel centers. More specifically, the pixel just drawn 
is pixel (x, y) whose center in real screen coordinates is at (x, y). The candidate 
pixels to be drawn to the right are pixels (x+1 ,y) and (x+1, y+l).The midpoint 
between the centers of the two candidate pixels is (x + 1 ,y + 0.5). If the line 
passes below this midpoint we draw the bottom pixel, and otherwise we draw the 
top pixel (Figure 8.3). 

To decide whether the line passes above or below (x + 1, y + 0.5), we evaluate 
/(x,y + 0.5) in Equation (8.1). Recall from Section 2.5.1 that /(x,y) = 0 for 
points (x,y) on the line, /(x,y) > 0 for points on one side of the line, and 
/(x,y) < 0 for points on the other side of the line. Because —/(x,y) = 0 and 
/(x, y) = 0 are both perfectly good equations for the line, it is not immediately 
clear whether /(x, y) being positive indicates that (x, y) is above the line, or 
whether it is below. However, we can figure it out; the key term in Equation (8.1) 
is the y term (xi — Xo )y. Note that (xi — xo) is definitely positive because 
Xi > Xo - This means that as y increases, the term (xi—Xo)y gets larger (i.e., more 
positive or less negative). Thus, the case /(x,+ oo) is definitely positive, and 
definitely above the line, implying points above the line are all positive. Another 




































































8.1. Rasterization 


165 


way to look at it is that the y component of the gradient vector is positive. So 
above the line, where y can increase arbitrarily, f(x,y ) must be positive. This 
means we can make our code more specific by filling in the if statement: 
if /(x +l,y + 0.5) < 0 then 
y = y +1 

The above code will work nicely for lines of the appropriate slope (i.e., between 
zero and one). The reader can work out the other three cases which differ only in 
small details. 

If greater efficiency is desired, using an incremental method can help. An 
incremental method tries to make a loop more efficient by reusing computation 
from the previous step. In the midpoint algorithm as presented, the main compu¬ 
tation is the evaluation of f(x + 1,2/ + 0.5). Note that inside the loop, after the 
first iteration, either we already evaluated f(x — 1,2/+ 0.5) or f(x — 1, y — 0.5) 
(Figure 8.4). Note also this relationship: 

f(x +1,2/) = /(+ y) + (yo ~ 2/i) 
f(x + 1, y + 1) = f(x, y) + ( 2/0 - yi) + {xi - x 0 ). 

This allows us to write an incremental version of the code: 








1 

1 






Figure 8.4. When using 
the decision point shown 
between the two light gray 
pixels, we just drew the 
dark gray pixel, so we eval¬ 
uated f at one of the two left 
points shown. 


y = yo 

d = f(x Q + l,?/o + 0.5) 
for x = xq to x\ do 
draw(:r, y) 

if d < 0 then 

y = y +1 

d = d + (xi - x 0 ) + ( 2/0 - yi) 

else 

d = d+ ( 2/0 - 2 / 1 ) 

This code should run faster since it has little extra setup cost compared to the 
non-incremental version (that is not always true for incremental algorithms), but 
it may accumulate more numeric error because the evaluation of f(x,y + 0.5) 
may be composed of many adds for long lines. However, given that lines are 
rarely longer than a few thousand pixels, such an error is unlikely to be critical. 
Slightly longer setup cost, but faster loop execution, can be achieved by storing 
(xi — Xq) + ( 2/0 — 2/i) an d ( 2/0 — 2 / 1 ) as variables. We might hope a good compiler 
would do that for us, but if the code is critical, it would be wise to examine the 
results of compilation to make sure. 
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8.1.2 Triangle Rasterization 

We often want to draw a 2D triangle with 2D points p 0 = (xo, yo), Pi == (%i ,yi), 
and p 2 = ( x 2. y-i) in screen coordinates. This is similar to the line drawing 
problem, but it has some of its own subtleties. As with line drawing, we may 
wish to interpolate color or other properties from values at the vertices. This is 
straightforward if we have the barycentric coordinates (Section 2.7). For example, 
if the vertices have colors Co, ci, and C2, the color at a point in the triangle with 
barycentric coordinates (a, / 3 ,7) is 

c = ac 0 + / 3 c 1 + 7C2. 

This type of interpolation of color is known in graphics as Gouraud interpolation 
after its inventor (Gouraud, 1971). 

Another subtlety of rasterizing triangles is that we are usually rasterizing tri¬ 
angles that share vertices and edges. This means we would like to rasterize ad¬ 
jacent triangles so there are no holes. We could do this by using the midpoint 
algorithm to draw the outline of each triangle and then fill in the interior pixels. 
This would mean adjacent triangles both draw the same pixels along each edge. 
If the adjacent triangles have different colors, the image will depend on the order 
in which the two triangles are drawn. The most common way to rasterize trian¬ 
gles that avoids the order problem and eliminates holes is to use the convention 
that pixels are drawn if and only if their centers are inside the triangle, i.e., the 
barycentric coordinates of the pixel center are all in the interval (0,1). This raises 
the issue of what to do if the center is exactly on the edge of the triangle. There 
are several ways to handle this as will be discussed later in this section. The key 
observation is that barycentric coordinates allow us to decide whether to draw a 
pixel and what color that pixel should be if we are interpolating colors from the 
vertices. So our problem of rasterizing the triangle boils down to efficiently find¬ 
ing the barycentric coordinates of pixel centers (Pineda, 1988). The brute-force 
rasterization algorithm is: 
for all x do 
for all y do 

compute (a, /?, 7) for (x, y) 
if (a £ [0,1] and /? £ [0,1] and 7 £ [0,1]) then 
c = ac 0 + / 3 c 1 + 7C2 
drawpixel (x, y ) with color c 

The rest of the algorithm limits the outer loops to a smaller set of candidate pixels 
and makes the barycentric computation efficient. 
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We can add a simple efficiency by finding the bounding rectangle of the 
three vertices and only looping over this rectangle for candidate pixels to draw. 
We can compute barycentric coordinates using Equation (2.33). This yields the 
algorithm: 

Zmin = floor (Xi) 

Zmax = ceiling (a;,) 

2/min = floor (l/i) 

2/max = ceiling (y t ) 
for y = 2/min to 2/max do 
for X — X m iii tO .r'niax do 

a = fi 2 (x,y)/f 12 (x 0 ,yo) 

P = f 2 o(x,y)/f 2 o(xi,yi) 

7 = foi{x,y)/foi(x 2 ,y 2 ) 
if (a > 0 and /3 > 0 and 7 > 0) then 
c = ac 0 + /3c 1 + 7 C 2 
drawpixel (x, y) with color c 

Here / y is the line given by Equation (8.1) with the appropriate vertices: 

/01 (x, y) = ( 2/0 -yi)x+ (xx - x 0 )y + x 0 2/i ~ xiyo, 

fi 2 (x, y) = ( 2/1 - 2 / 2 )x + (x 2 - xi)y + x x y 2 - x 2 yi, 

f 2 o{x, y ) = ( 2/2 - yo)x + (x 0 - x 2 )y + x 2 yo - x 0 y 2 - 

Note that we have exchanged the test a £ (0,1) with a > 0 etc., because if 

all of a, (3, 7 are positive, then we know they are all less than one because a + 
/3 + 7 = 1. We could also compute only two of the three barycentric variables 
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Figure 8 . 5 . A colored triangle with barycentric interpolation. Note that the changes in color 
components are linear in each row and column as well as along each edge. In fact it is 
constant along every line, such as the diagonals, as well. (See also Plate II.) 
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Figure 8.6. The off¬ 
screen point will be on one 
side of the triangle edge 
or the other. Exactly one 
of the non-shared vertices 
a and b will be on the 
same side. 


and get the third from that relation, but it is not clear that this saves computation 
once the algorithm is made incremental, which is possible as in the line drawing 
algorithms; each of the computations of a, (3, and 7 does an evaluation of the 
form f(x, y ) = Ax + By + C. In the inner loop, only x changes, and it changes 
by one. Note that f(x +1,2/) = fix, y) + A. This is the basis of the incremental 
algorithm. In the outer loop, the evaluation changes for /( x, y) to f(x, y + 1), 
so a similar efficiency can be achieved. Because a, f3, and 7 change by constant 
increments in the loop, so does the color c. So this can be made incremental as 
well. For example, the red value for pixel (x + 1, y) differs from the red value 
for pixel (x, y) by a constant amount that can be precomputed. An example of a 
triangle with color interpolation is shown in Figure 8 .5. 


Dealing with Pixels on Triangle Edges 

We have still not discussed what to do for pixels whose centers are exactly on 
the edge of a triangle. If a pixel is exactly on the edge of a triangle, then it is 
also on the edge of the adjacent triangle if there is one. There is no obvious way 
to award the pixel to one triangle or the other. The worst decision would be to 
not draw the pixel because a hole would result between the two triangles. Better, 
but still not good, would be to have both triangles draw the pixel. If the triangles 
are transparent, this will result in a double-coloring. We would really like to 
award the pixel to exactly one of the triangles, and we would like this process 
to be simple; which triangle is chosen does not matter as long as the choice is 
well defined. 

One approach is to note that any off-screen point is definitely on exactly one 
side of the shared edge and that is the edge we will draw. For two non-overlapping 
triangles, the vertices not on the edge are on opposite sides of the edge from each 
other. Exactly one of these vertices will be on the same side of the edge as the 
off-screen point (Figure 8 . 6 ). This is the basis of the test. The test if numbers p 
and q have the same sign can be implemented as the test pq > 0 , which is very 
efficient in most environments. 

Note that the test is not perfect because the line through the edge may also 
go through the offscreen point, but we have at least greatly reduced the number 
of problematic cases. Which off-screen point is used is arbitrary, and (x, y) = 
(—1, —1) is as good a choice as any. We will need to add a check for the case of a 
point exactly on an edge. We would like this check not to be reached for common 
cases, which are the completely inside or outside tests. This suggests: 

•t-min — floor (27) 

aw = ceiling ( 27 ) 
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2/min — floOl {jji) 

2/max = ceiling (y*) 
fa = f 12(^0, 2/0) 
f /3 = f2o(xi,yi) 

/ 7 = /oi (3^2,2/2) 
for y = y min to ymax do 
for x — ■ ri], : r to x max do 
a = fi2(x,y)/f a 
P = f2o(x,y)/f/3 
7 = foi(x,y)/f 7 

if (a > 0 and /3 > 0 and 7 > 0) then 
if (a > 0 or / a /i 2 (-l, -1) > 0) and (/3 > 0 or fpf2o(~l, -1) > 0) 
and (7 > 0 or / 7 / 0 i(-l, -1) > 0 ) then 
c = ac 0 + /3ci + 7 C 2 
drawpixel (x, y) with color c 

We might expect that the above code would work to eliminate holes and double¬ 
draws only if we use exactly the same line equation for both triangles. In fact, 
the line equation is the same only if the two shared vertices have the same order 
in the draw call for each triangle. Otherwise the equation might flip in sign. This 
could be a problem depending on whether the compiler changes the order of op¬ 
erations. So if a robust implementation is needed, the details of the compiler and 
arithmetic unit may need to be examined. The first four lines in the pseudocode 
above must be coded carefully to handle cases where the edge exactly hits the 
pixel center. 

In addition to being amenable to an incremental implementation, there are 
several potential early exit points. For example, if a is negative, there is no need 
to compute f3 or 7 . While this may well result in a speed improvement, profiling is 
always a good idea; the extra branches could reduce pipelining or concurrency and 
might slow down the code. So as always, test any attractive-looking optimizations 
if the code is a critical section. 

Another detail of the above code is that the divisions could be divisions by 
zero for degenerate triangles, i.e., if f 1 = 0. Either the floating point error condi¬ 
tions should be accounted for properly, or another test will be needed. 


8.1.3 Clipping 

Simply transforming primitives into screen space and rasterizing them does not 
quite work by itself. This is because primitives that are outside the view volume— 
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Figure 8.7. The depth z is transformed to the depth z' by the perspective transform. Note 
that when z moves from positive to negative, z' switches from negative to positive. Thus 
vertices behind the eye are moved in front of the eye beyond z' = n + f. This will lead to 
wrong results, which is why the triangle is first clipped to ensure all vertices are in front of the 
eye. 
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particularly, primitives that are behind the eye —can end up being rasterized, lead¬ 
ing to incorrect results. For instance, consider the triangle shown in Figure 8.7. 
Two vertices are in the view volume, but the third is behind the eye. The projec¬ 
tion transformation maps this vertex to a nonsensical location behind the far plane, 
and if this is allowed to happen the triangle will be rasterized incorrectly. For this 
reason, rasterization has to be preceded by a clipping operation that removes parts 
of primitives that could extend behind the eye. 

Clipping is a common operation in graphics, needed whenever one geometric 
entity “cuts” another. For example, if you clip a triangle against the plane x = 0, 
the plane cuts the triangle into two parts if the signs of the ^-coordinates of the 
vertices are not all the same. In most applications of clipping, the portion of the 
triangle on the “wrong” side of the plane is discarded. This operation for a single 
plane is shown in Figure 8.8. 

In clipping to prepare for rasterization, the “wrong” side is the side outside 
the view volume. It is always safe to clip away all geometry outside the view 
volume—that is, clipping against all six faces of the volume—but many systems 
manage to get away with only clipping against the near plane. 

This section discusses the basic implementation of a clipping module. Those 
interested in implementing an industrial-speed clipper should see the book by 
Blinn mentioned in the notes at the end of this chapter. 

The two most common approaches for implementing clipping are 

1. in world coordinates using the six planes that bound the truncated viewing 
pyramid, 

2. in the 4D transformed space before the homogeneous divide. 

Either possibility can be effectively implemented (J. Blinn, 1996) using the fol¬ 
lowing approach for each triangle: 

for each of six planes do 

if (triangle entirely outside of plane) then 
break (triangle is not visible) 
else if triangle spans plane then 
clip triangle 

if (quadrilateral is left) then 
break into two triangles 

8.1.4 Clipping Before the Transform (Option 1) 

Option 1 has a straightforward implementation. The only question is, “What are 
the six plane equations?” Because these equations are the same for all triangles 



Figure 8 .8. A polygon 
is clipped against a clipping 
plane. The portion “inside” 
the plane is retained. 
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rendered in the single image, we do not need to compute them very efficiently. 
For this reason, we can just invert the transform shown in Figure 5.11 and apply 
it to the eight vertices of the transformed view volume: 

(x,y,z) =(l,b,n) 

(r, 6, n) 

( l,t,n ) 

(r, t, n) 

C hb,f ) 

(r, b, f) 

C ht,f) 

( r,t,f) 

The plane equations can be inferred from here. Alternatively, we can use vector 
geometry to get the planes directly from the viewing parameters. 


8.1.5 Clipping in Homogeneous Coordinates (Option 2) 

Surprisingly, the option usually implemented is that of clipping in homogeneous 
coordinates before the divide. Here the view volume is 4D, and it is bounded by 
3D volumes (hyperplanes). These are: 

— x + Iw = 0 
x — rw = 0 
— y + bw = 0 
y — tw = 0 
—z + nw = 0 
z — fw = 0 

These planes are quite simple, so the efficiency is better than for Option 1. They 
still can be improved by transforming the view volume [l,r\ x [b,t] x [f,n] to 
[0, l] 3 . It turns out that the clipping of the triangles is not much more complicated 
than in 3D. 


8.1.6 Clipping against a Plane 

No matter which option we choose, we must clip against a plane. Recall from 
Section 2.5.5 that the implicit equation for a plane through point q with normal 
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n is 

/(P) = n - (p - q) = 0. 

This is often written 

/(p) = n-p + £> = 0. (8.2) 

Interestingly, this equation not only describes a 3D plane, but it also describes a 
line in 2D and the volume analog of a plane in 4D. All of these entities are usually 
called planes in their appropriate dimension. 

If we have a line segment between points a and b, we can “clip” it against 
a plane using the techniques for cutting the edges of 3D triangles in BSP tree 
programs described in Section 12.4.3. Here, the points a and b are tested to 
determine whether they are on opposite sides of the plane /(p) = 0 by checking 
whether /(a) and /(b) have different signs. Typically /(p) < 0 is defined to be 
“inside” the plane, and /(p) > 0 is “outside” the plane. If the plane does split the 
line, then we can solve for the intersection point by substituting the equation for 
the parametric line, 

p = a + f(b - a), 

into the /(p) = 0 plane of Equation (8.2). This yields 

n • (a + i(b — a)) + D = 0. 

Solving for t gives 

n ■ a + D 
n • (a — b) 

We can then find the intersection point and “shorten” the line. 

To clip a triangle, we again can follow Section 12.4.3 to produce one or two 
triangles . 


8.2 Operations Before and After Rasterization 

Before a primitive can be rasterized, the vertices that define it must be in screen 
coordinates, and the colors or other attributes that are supposed to be interpolated 
across the primitive must be known. Preparing this data is the job of the vertex¬ 
processing stage of the pipeline. In this stage, incoming vertices are transformed 
by the modeling, viewing, and projection transformations, mapping them from 
their original coordinates into screen space (where, recall, position is measured 
in terms of pixels). At the same time, other information, such as colors, surface 
normals, or texture coordinates, is transformed as needed; we’ll discuss these 
additional attributes in the examples below. 
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After rasterization, further processing is done to compute a color and depth 
for each fragment. This processing can be as simple as just passing through an in¬ 
terpolated color and using the depth computed by the rasterizer; or it can involve 
complex shading operations. Finally, the blending phase combines the fragments 
generated by the (possibly several) primitives that overlapped each pixel to com¬ 
pute the final color. The most common blending approach is to choose the color 
of the fragment with the smallest depth (closest to the eye). 

The purposes of the different stages are best illustrated by examples. 


8.2.1 Simple 2D Drawing 

The simplest possible pipeline does nothing in the vertex or fragment stages, and 
in the blending stage the color of each fragment simply overwrites the value of the 
previous one. The application supplies primitives directly in pixel coordinates, 
and the rasterizer does all the work. This basic arrangement is the essence of 
many simple, older APIs for drawing user interfaces, plots, graphs, and other 2D 
content. Solid color shapes can be drawn by specifying the same color for all 
vertices of each primitive, and our model pipeline also supports smoothly varying 
color using interpolation. 


8.2.2 A Minimal 3D Pipeline 



sion cycles, which cannot 
be drawn in back-to-front 
order. 


To draw objects in 3D, the only change needed to the 2D drawing pipeline is a 
single matrix transformation: the vertex-processing stage multiplies the incoming 
vertex positions by the product of the modeling, camera, projection, and viewport 
matrices, resulting in screen-space triangles that are then drawn in the same way 
as if they’d been specified directly in 2D. 

One problem with the minimal 3D pipeline is that in order to get occlusion 
relationships correct—to get nearer objects in front of farther away objects — 
primitives must be drawn in back-to-front order. This is known as the painter’s 
algorithm for hidden surface removal, by analogy to painting the background of 
a painting first, then painting the foreground over it. The painter’s algorithm is 
a perfectly valid way to remove hidden surfaces, but it has several drawbacks. 
It cannot handle triangles that intersect one another, because there is no correct 
order in which to draw them. Similarly, several triangles, even if they don't inter¬ 
sect, can still be arranged in an occlusion cycle, as shown in Figure 8.9, another 
case in which the back-to-front order does not exist. And most importantly, sort¬ 
ing the primitives by depth is slow, especially for large scenes, and disturbs the 
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efficient flow of data that makes object-order rendering so fast. Figure 8.10 shows 
the result of this process when the objects are not sorted by depth. 


8.2.3 Using a z-Buffer for Hidden Surfaces 

In practice the painter’s algorithm is rarely used; instead a simple and effective 
hidden surface removal algorithm known as the z-buffer algorithm is used. The 
method is very simple: at each pixel we keep track of the distance to the closest 
surface that has been drawn so far, and we throw away fragments that are farther 
away than that distance. The closest distance is stored by allocating an extra value 
for each pixel, in addition to the red, green, and blue color values, which is known 
as the depth, or z-value. The depth buffer, or z-buffer, is the name for the grid of 
depth values. 

The z-buffer algorithm is implemented in the fragment blending phase, by 
comparing the depth of each fragment with the current value stored in the z-buffer. 
If the fragment’s depth is closer, both its color and its depth value overwrite the 
values currently in the color and depth buffers. If the fragment’s depth is farther 
away, it is discarded. To ensure that the first fragment will pass the depth test, the z 
buffer is initialized to the maximum depth (the depth of the far plane). Irrespective 
of the order in which surfaces are drawn, the same fragment will win the depth 
test, and the image will be the same. 

The z-buffer algorithm requires each fragment to carry a depth. This is done 
simply by interpolating the ^-coordinate as a vertex attribute, in the same way that 
color or other attributes are interpolated. 

The z-buffer is such a simple and practical way to deal with hidden surfaces in 
object-order rendering that it is by far the dominant approach. It is much simpler 
than geometric methods that cut surfaces into pieces that can be sorted by depth, 
because it avoids solving any problems that don’t need to be solved. The depth 
order only needs to be determined at the locations of the pixels, and that is all 
that the z-buffer does. It is universally supported by hardware graphics pipelines 
and is also the most commonly used method for software pipelines. Figure 8.11 
shows an example result. 

Precision Issues 

In practice, the z-values stored in the buffer are non-negative integers. This is 
preferable to true floats because the fast memory needed for the z-buffer is some¬ 
what expensive and is worth keeping to a minimum. 

The use of integers can cause some precision problems. If we use an integer 
range having B values {0,1,...,-B — l},we can map 0 to the near clipping plane 



Figure 8.10. The result 
of drawing two spheres of 
identical size using the min¬ 
imal pipeline. The sphere 
that appears smaller is far¬ 
ther away but is drawn last, 
so it incorrectly overwrites 
the nearer one. 


Of course there can be ties 
in the depth test, in which 
case the order may well 
matter. 



Figure 8.11. The result 
of drawing the same two 
spheres using the z-buffer. 
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Figure 8.12. A z-buffer rasterizing two triangles in each of two possible orders. The first 
triangle is fully rasterized. The second triangle has every pixel computed, but for three of the 
pixels the depth-contest is lost, and those pixels are not drawn. The final image is the same 
regardless. 
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z = n and B — 1 to the far clipping plane z = f. Note, that for this discussion, we 
assume z, n, and / are positive. This will result in the same results as the negative 
case, but the details of the argument are easier to follow. We send each z- value to 
a “bucket” with depth A z = (/ — n)/B. We would not use the integer z-buffer if 
memory were not a premium, so it is useful to make B as small as possible. 

If we allocate b bits to store the z-value, then B = 2 b . We need enough bits 
to make sure any triangle in front of another triangle will have its depth mapped 
to distinct depth bins. 

For example, if you are rendering a scene where triangles have a separation of 
at least one meter, then Az < 1 should yield images without artifacts. There are 
two ways to make Az smaller: move n and / closer together or increase b. If b is 
fixed, as it may be in APIs or on particular hardware platforms, adjusting n and / 
is the only option. 

The precision of z-buffers must be handled with great care when perspective 
images are created. The value Az above is used after the perspective divide. 
Recall from Section 7.3 that the result of the perspective divide is 


, , fn 

z = n + j -. 

The actual bin depth is related to z w , the world depth, rather than z, the post¬ 
perspective divide depth. We can approximate the bin size by differentiating both 
sides: 

~ f nAz w 


Bin sizes vary in depth. The bin size in world space is 


A z ,1 


; Az 


fn 


Note that the quantity Az is as discussed before. The biggest bin will be for 
z' = f , where 

A.max _ /Az 

w n 

Note that choosing n = 0, a natural choice if we don’t want to lose objects right 
in front of the eye, will result in an infinitely large bin—a very bad condition. To 
make Az™ ax as small as possible, we want to minimize / and maximize n. Thus, 
it is always important to choose n and / carefully. 


8.2.4 Per-vertex Shading 

So far the application sending triangles into the pipeline is responsible for setting 
the color; the rasterizer just interpolates the colors and they are written directly 
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Figure 8.13. Two 

spheres drawn using per- 
pixel (Gouraud) shading. 
Because the triangles are 
large, interpolation artifacts 
are visible. 


into the output image. For some applications this is sufficient, but in many cases 
we want 3D objects to be drawn with shading, using the same illumination equa¬ 
tions that we used for image-order rendering in Chapter 4. Recall that these equa¬ 
tions require a light direction, an eye direction, and a surface normal to compute 
the color of a surface. 

One way to handle shading computations is to perform them in the vertex 
stage. The application provides normal vectors at the vertices, and the positions 
and colors of the lights are provided separately (they don’t vary across the surface, 
so they don’t need to be specified for each vertex). For each vertex, the direction 
to the viewer and the direction to each light are computed based on the positions 
of the camera, the lights, and the vertex. The desired shading equation is evaluated 
to compute a color, which is then passed to the rasterizer as the vertex color. Per- 
vertex shading is sometimes called Gouraud shading. 

One decision to be made is the coordinate system in which shading com¬ 
putations are done. World space or eye space are good choices. It is impor¬ 
tant to choose a coordinate system that is orthonormal when viewed in world 
space, because shading equations depend on angles between vectors, which are 
not preserved by operations like nonuniform scale that are often used in the mod¬ 
eling transformation, or perspective projection, often used in the projection to the 
canonical view volume. Shading in eye space has the advantage that we don’t 
need to keep track of the camera position, because the camera is always at the 
origin in eye space, in perspective projection, or the view direction is always +z 
in orthographic projection. 

Per-vertex shading has the disadvantage that it cannot produce any details in 
the shading that are smaller than the primitives used to draw the surface, because 
it only computes shading once for each vertex and never in between vertices. 
For instance, in a room with a floor that is drawn using two large triangles and 
illuminated by a light source in the middle of the room, shading will be evaluated 
only at the corners of the room, and the interpolated value will likely be much too 
dark in the center. Also, curved surfaces that are shaded with specular highlights 
must be drawn using primitives small enough that the highlights can be resolved. 

Figure 8.13 shows our two spheres drawn with per-vertex shading. 


Per-fragment shading is 
sometimes called Phong 
shading, which is confusing 
because the same name 
is attached to the Phong 
illumination model. 


8.2.5 Per-fragment Shading 

To avoid the interpolation artifacts associated with per-vertex shading, we can 
avoid interpolating colors by performing the shading computations after the in¬ 
terpolation, in the fragment stage. In per-fragment shading, the same shading 
equations are evaluated, but they are evaluated for each fragment using interpo¬ 
lated vectors, rather than for each vertex using the vectors from the application. 
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In per-fragment shading the geometric information needed for shading is 
passed through the rasterizer as attributes, so the vertex stage must coordinate 
with the fragment stage to prepare the data appropriately. One approach is to in¬ 
terpolate the eye-space surface normal and the eye-space vertex position, which 
then can be used just as they would in per-vertex shading. 

Figure 8.14 shows our two spheres drawn with per-vertex shading. 


8.2.6 Texture Mapping 

Textures (discussed in Chapter 11) are images that are used to add extra detail to 
the shading of surfaces that would otherwise look too homogeneous and artificial. 
The idea is simple: each time shading is computed, we read one of the values 
used in the shading computation—the diffuse color, for instance—from a texture 
instead of using the attribute values that are attached to the geometry being ren¬ 
dered. This operation is known as a texture lookup: the shading code specifies a 
texture coordinate, a point in the domain of the texture, and the texture-mapping 
system finds the value at that point in the texture image and returns it. The texture 
value is then used in the shading computation. 

The most common way to define texture coordinates is simply to make the 
texture coordinate another vertex attribute. Each primitive then knows where it 
lives in the texture. 



Figure 8.14. Two spheres 
drawn using per-fragment 
shading. Because the trian¬ 
gles are large, interpolation 
artifacts are visible. 


8.2.7 Shading Frequency 

The decision about where to place shading computations depends on how fast the 
color changes—the scale of the details being computed. Shading with large-scale 
features, such as diffuse shading on curved surfaces, can be evaluated fairly infre¬ 
quently and then interpolated: it can be computed with a low shading frequency. 
Shading that produces small-scale features, such as sharp highlights or detailed 
textures, needs to be evaluated at a high shading frequency. For details that need 
to look sharp and crisp in the image, the shading frequency needs to be at least 
one shading sample per pixel. 

So large-scale effects can safely be computed in the vertex stage, even when 
the vertices defining the primitives are many pixels apart. Effects that require a 
high shading frequency can also be computed at the vertex stage, as long as the 
vertices are close together in the image; alternatively, they can be computed at the 
fragment stage when primitives are larger than a pixel. 
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For example, a hardware pipeline as used in a computer game, generally us¬ 
ing primitives that cover several pixels to ensure high efficiency, normally does 
most shading computations per fragment. On the other hand, the PhotoRealistic 
RenderMan system does all shading computations per vertex, after first subdivid¬ 
ing, or dicing, all surfaces into small quadrilaterals called micropolygons that are 
about the size of pixels. Since the primitives are small, per-vertex shading in this 
system achieves a high shading frequency that is suitable for detailed shading. 


8.3 Simple Antialiasing 


There are better filters than 
the box, but a box filter will 
suffice for all but the most 
demanding applications. 


Just as with ray tracing, rasterization will produce jagged lines and triangle edges 
if we make an all-or-nothing determination of whether each pixel is inside the 
primitive or not. In fact, the set of fragments generated by the simple triangle 
rasterization algorithms described in this chapter, sometimes called standard or 
aliased rasterization, is exactly the same as the set of pixels that would be mapped 
to that triangle by a ray tracer that sends one ray through the center of each pixel. 
Also as in ray tracing, the solution is to allow pixels to be partly covered by a 
primitive (Crow, 1978). In practice this form of blurring helps visual quality, 
especially in animations. This is shown as the top line of Figure 8.15. 

There are a number of different approaches to antialiasing in rasterization 
applications. Just as with a ray tracer, we can produce an antialiased image by 
setting each pixel value to the average color of the image over the square area 
belonging to the pixel, an approach known as box filtering. This means we have 
to think of all drawable entities as having well-defined areas. For example, the line 
in Figure 8.15 can be thought of as approximating a one-pixel-wide rectangle. 



Figure 8.15. An antialiased and a jaggy line viewed at close range so individual pixels are 
visible. 
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The easiest way to implement box-filter antialiasing is by supersampling : cre¬ 
ate images at very high resolutions and then downsample. For example, if our 
goal is a 256 x 256 pixel image of a line with width 1.2 pixels, we could rasterize 
a rectangle version of the line with width 4.8 pixels on a 1024 x 1024 screen, 
and then average 4x4 groups of pixels to get the colors for each of the 256 x 
256 pixels in the “shrunken” image. This is an approximation of the actual box- 
filtered image, but works well when objects are not extremely small relative to the 
distance between pixels. 

Supersampling is quite expensive, however. Because the very sharp edges 
that cause aliasing are normally caused by the edges of primitives, rather than 
sudden variations in shading within a primitive, a widely used optimization is 
to sample visibility at a higher rate than shading. If information about coverage 
and depth is stored for several points within each pixel, very good antialiasing 
can be achieved even if only one color is computed. In systems like RenderMan 
that use per-vertex shading, this is achieved by rasterizing at high resolution: it is 
inexpensive to do so because shading is simply interpolated to produce colors for 
the many fragments, or visibility samples. In systems with per-fragment shading, 
such as hardware pipelines, multisample antialiasing is achieved by storing for 
each fragment a single color plus a coverage mask and a set of depth values. 


8.4 Culling Primitives for Efficiency 

The strength of object-order rendering, that it requires a single pass over all the 
geometry in the scene, is also a weakness for complex scenes. For instance, in a 
model of an entire city, only a few buildings are likely to be visible at any given 
time. A correct image can be obtained by drawing all the primitives in the scene, 
but a great deal of effort will be wasted processing geometry that is behind the 
visible buildings, or behind the viewer, and therefore doesn't contribute to the 
final image. 

Identifying and throwing away invisible geometry to save the time that would 
be spent processing it is known as culling. Three commonly implemented culling 
strategies (often used in tandem) are: 

• view volume culling— the removal of geometry that is outside the view 
volume; 

• occlusion culling— the removal of geometry that may be within the view 
volume but is obscured, or occluded, by other geometry closer to the 
camera; 

• backface culling— the removal of primitives facing away from the camera. 
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We will briefly discuss view volume culling and backface culling, but culling 
in high performance systems is a complex topic; see (Akenine-Moller et al., 2008) 
for a complete discussion and for information about occlusion culling. 


8.4.1 View Volume Culling 

When an entire primitive lies outside the view volume, it can be culled, since it 
will produce no fragments when rasterized. If we can cull many primitives with a 
quick test, we may be able to speed up drawing significantly. On the other hand, 
testing primitives individually to decide exactly which ones need to be drawn may 
cost more than just letting the rasterizer eliminate them. 

View volume culling, also known as view frustum cullmg, is especially help¬ 
ful when many triangles are grouped into an object with an associated bounding 
volume. If the bounding volume lies outside the view volume, then so do all the 
triangles that make up the object. For example, if we have 1000 triangles bounded 
by a single sphere with center c and radius r, we can check whether the sphere 
lies outside the clipping plane. 


(p-a) -n = 0, 

where a is a point on the plane, and p is a variable. This is equivalent to checking 
whether the signed distance from the center of the sphere c to the plane is greater 
than +r. This amounts to the check that 


Note that the sphere may overlap the plane even in a case where all the triangles 
do lie outside the plane. Thus, this is a conservative test. How conservative the 
test is depends on how well the sphere bounds the object. 

The same idea can be applied hierarchically if the scene is organized in one 
of the spatial data structures described in Chapter 12. 


8.4.2 Backface Culling 

When polygonal models are closed, i.e., they bound a closed space with no holes, 
then they are often assumed to have outward facing normal vectors as discussed 
in Chapter 10. For such models, the polygons that face away from the eye are 
certain to be overdrawn by polygons that face the eye. Thus, those polygons can 
be culled before the pipeline even starts. The test for this condition is the same 
one used for silhouette drawing given in Section 10.3.1. 
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Frequently Asked Questions 


• I’ve often seen clipping discussed at length, and it is a much more in¬ 
volved process than that described in this chapter. What is going on here? 

The clipping described in this chapter works, but lacks optimizations that an 
industrial-strength clipper would have. These optimizations are discussed in de¬ 
tail in Blinn’s definitive work listed in the chapter notes. 

• How are polygons that are not triangles rasterized? 

These can either be done directly scan-line by scan-line, or they can be broken 
down into triangles. The latter appears to be the more popular technique. 

• Is it always better to antialias? 

No. Some images look crisper without antialiasing. Many programs use unan- 
tialiased “screen fonts” because they are easier to read. 

• The documentation for my API talks about “scene graphs” and “matrix 
stacks.” Are these part of the graphics pipeline? 

The graphics pipeline is certainly designed with these in mind, and whether we 
define them as part of the pipeline is a matter of taste. This book delays their 
discussion until Chapter 12. 

• Is a uniform distance z-buffer better than the standard one that includes 
perspective matrix non-linearities? 

It depends. One “feature” of the non-linearities is that the z-buffer has more res¬ 
olution near the eye and less in the distance. If a level-of-detail system is used, 
then geometry in the distance is coarser and the “unfairness” of the z-buffer can 
be a good thing. 

• Is a software z-buffer ever useful? 

Yes. Most of the movies that use 3D computer graphics have used a variant of the 
software z-buffer developed by Pixar (Cook et al., 1987). 
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Notes 

A wonderful book about designing a graphics pipeline is Jim Blinn's Corner: 
A Trip Down the Graphics Pipeline (J. Blinn, 1996). Many nice details of the 
pipeline and culling are in 3D Game Engine Design (Eberly, 2000) and Real-Time 
Rendering (Akenine-Moller et al., 2008). 

Exercises 

1. Suppose that in the perspective transform we have n = 1 and / = 2. Under 
what circumstances will we have a “reversal” where a vertex before and 
after the perspective transform flips from in front of to behind the eye or 
vice-versa? 

2. Is there any reason not to clip in x and y after the perspective divide (see 
Figure 11.2, stage 3)? 

3. Derive the incremental form of the midpoint line-drawing algorithm with 
colors at endpoints for 0 < m < 1. 

4. Modify the triangle-drawing algorithm so that it will draw exactly one pixel 
for points on a triangle edge which goes through (x,y) = (-1,-1). 

5. Suppose you are designing an integer z-buffer for flight simulation where 
all of the objects are at least one meter thick, are never closer to the viewer 
than 4 meters, and may be as far away as 100 km. How many bits are 
needed in the z-buffer to ensure there are no visibility errors? Suppose that 
visibility errors only matter near the viewer, i.e., for distances less than 100 
meters. How many bits are needed in that case? 
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Signal Processing 


In graphics, we often deal with functions of a continuous variable: an image is 
the first example you have seen, but you will encounter many more as you con¬ 
tinue your exploration of graphics. By their nature continuous functions can’t be 
directly represented in a computer; we have to somehow represent them using 
a finite number of bits. One of the most useful approaches to representing con¬ 
tinuous functions is to use samples of the function: just store the values of the 
function at many different points and reconstruct the values in between when and 
if they are needed. 

You are by now familiar with the idea of representing an image using a two- 
dimensional grid of pixels —so you have already seen a sampled representation! 
Think of an image captured by a digital camera: the actual image of the scene that 
was formed by the camera’s lens is a continuous function of the position on the 
image plane, and the camera converted that function into a two-dimensional grid 
of samples. Mathematically, the camera converted a function of type R 2 —> C 
(where C is the set of colors) to a two-dimensional array of color samples, or a 
function of type Z 2 —> C. 

Another example of a sampled representation is a 2D digitizing tablet such 
as the screen of a tablet computer or PDA. In this case the original function is 
the motion of the stylus, which is a time-varying 2D position, or a function of 
type R —> R 2 . The digitizer measures the position of the stylus at many points in 
time, resulting in a sequence of 2D coordinates, or a function of type Z —> R 2 . A 
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motion capture system does exactly the same thing for a special marker attached 
to an actor’s body: it takes the 3D position of the marker over time (R —> R 3 ) and 
makes it into a series of instantaneous position measurements (Z —> R 3 ). 

Going up in dimension, a medical CT scanner, used to non-invasively examine 
the interior of a person’s body, measures density as a function of position inside 
the body. The output of the scanner is a 3D grid of density values: it converts the 
density of the body (R 3 —> R) to a 3D array of real numbers (Z 3 —> R). 

These examples seem different, but in fact they can all be handled using ex¬ 
actly the same mathematics. In all cases a function is being sampled at the points 
of a lattice in one or more dimensions, and in all cases we need to be able to 
reconstruct that original continuous function from the array of samples. 

From the example of a 2D image, it may seem that the pixels are enough, 
and we never need to think about continuous functions again once the camera has 
discretized the image. But what if we want to make the image larger or smaller on 
the screen, particularly by non-integer scale factors? It turns out that the simplest 
algorithms to do this perform badly, introducing obvious visual artifacts known 
as aliasing. Explaining why aliasing happens and understanding how to prevent it 
requires the mathematics of sampling theory. The resulting algorithms are rather 
simple, but the reasoning behind them, and the details of making them perform 
well, can be subtle. 

Representing continuous functions in a computer is, of course, not unique to 
graphics; nor is the idea of sampling and reconstruction. Sampled representations 
are used in applications from digital audio to computational physics, and graphics 
is just one (and by no means the first) user of the related algorithms and mathe¬ 
matics. The fundamental facts about how to do sampling and reconstruction have 
been known in the field of communications since the 1920s and were stated in 
exactly the form we use them by the 1940s (Shannon & Weaver, 1964). 

This chapter starts by summarizing sampling and reconstruction using the 
concrete one-dimensional example of digital audio. Then we go on to present 
the basic mathematics and algorithms that underlie sampling and reconstruction 
in one and two dimensions. Finally we go into the details of the frequency-domain 
viewpoint, which provides many insights into the behavior of these algorithms. 

9.1 Digital Audio: Sampling in 1D 

Although sampled representations had already been in use for years in telecom¬ 
munications, the introduction of the compact disc in 1982, following the increased 
use of digital recording for audio in the previous decade, was the first highly vis¬ 
ible consumer application of sampling. 
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Figure 9.1. Sampling and reconstruction in digital audio. 


In audio recording, a microphone converts sound, which exists as pressure 
waves in the air, into a time-varying voltage that amounts to a measurement of the 
changing air pressure at the point where the microphone is located. This electrical 
signal needs to be stored somehow so that it may be played back at a later time 
and sent to a loudspeaker that converts the voltage back into pressure waves by 
moving a diaphragm in synchronization with the voltage. 

The digital approach to recording the audio signal (Figure 9.1) uses sampling: 
an analog-to-digital converter ( A/D converter, or ADC) measures the voltage 
many thousand times per second, generating a stream of integers that can eas¬ 
ily be stored on any number of media, say a disk on a computer in the recording 
studio, or transmitted to another location, say the memory in a portable audio 
player. At playback time, the data is read out at the appropriate rate and sent to a 
digital-to-analog converter (D/A converter, or DAC). The DAC produces a volt¬ 
age according to the numbers it receives, and, provided we take enough samples 
to fairly represent the variation in voltage, the resulting electrical signal is, for all 
practical purposes, identical to the input. 

It turns out that the number of samples per second required to end up with a 
good reproduction depends on how high-pitched the sounds are that we are trying 
to record. A sample rate that works fine for reproducing a string bass or a kick 
drum produces bizarre-sounding results if we try to record a piccolo or a cymbal; 
but those sounds are reproduced just fine with a higher sample rate. To avoid these 
undersampling artifacts the digital audio recorder filters the input to the ADC to 
remove high frequencies that can cause problems. 

Another kind of problem arises on the output side. The DAC produces a 
voltage that changes whenever a new sample comes in, but stays constant until 
the next sample, producing a stair-step shaped graph. These stair-steps act like 
noise, adding a high-frequency, signal-dependent buzzing sound. To remove this 
reconstruction artifact, the digital audio player filters the output from the DAC to 
smooth out the waveform. 
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Figure 9.2. A sine wave (gray curve) sampled at two different rates. Top: at a high sample 
rate, the resulting samples (black dots) represent the signal well. Bottom: a lower sample 
rate produces an ambiguous result: the samples are exactly the same as would result from 
sampling a wave of much lower frequency (dashed curve). 

9.1.1 Sampling Artifacts and Aliasing 

The digital audio recording chain can serve as a concrete model for the sampling 
and reconstruction processes that happen in graphics. The same kind of under¬ 
sampling and reconstruction artifacts also happen with images or other sampled 
signals in graphics, and the solution is the same: filtering before sampling and 
filtering again during reconstruction. 

A concrete example of the kind of artifacts that can arise from too-low sample 
frequencies is shown in Figure 9.2. Here we are sampling a simple sine wave 
using two different sample frequencies: 10.8 samples per cycle on the top and 
1.2 samples per cycle on the bottom. The higher rate produces a set of samples 
that obviously capture the signal well, but the samples resulting from the lower 
sample rate are indistinguishable from samples of a low-frequency sine wave —in 
fact, faced with this set of samples the low-frequency sinusoid seems the more 
likely interpretation. 

Once the sampling has been done, it is impossible to know which of the two 
signals—the fast or the slow sine wave—was the original, and therefore there is 
no single method that can properly reconstruct the signal in both cases. Because 
the high frequency signal is “pretending to be” a low-frequency signal, this phe¬ 
nomenon is known as aliasing. 

Aliasing shows up whenever flaws in sampling and reconstruction lead to arti¬ 
facts at surprising frequencies. In audio, aliasing takes the form of odd-sounding 
extra tones—a bell ringing at lOKHz, after being sampled at 8KHz, turns into a 
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6KHz tone. In images, aliasing often takes the form of moire patterns that re¬ 
sult from the interaction of the sample grid with regular features in an image, for 
instance the window blinds in Figure 9.34. 

Another example of aliasing in a synthetic image is the familiar stair-stepping 
on straight lines that are rendered with only black and white pixels (Figure 9.34). 
This is an example of small-scale features (the sharp edges of the lines) creating 
artifacts at a different scale (for shallow-slope lines the stair steps are very long). 

The basic issues of sampling and reconstruction can be understood simply 
based on features being too small or too large, but some more quantitative ques¬ 
tions are harder to answer: 

• What sample rate is high enough to ensure good results? 

• What kinds of filters are appropriate for sampling and reconstruction? 

• What degree of smoothing is required to avoid aliasing? 

Solid answers to these questions will have to wait until we have developed the 
theory fully in Section 9.5 


9.2 Convolution 

Before we discuss algorithms for sampling and reconstruction, we’ll first examine 
the mathematical concept on which they are based— convolution. Convolution 
is a simple mathematical concept that underlies the algorithms that are used for 
sampling, filtering, and reconstruction. It also is the basis of how we will analyze 
these algorithms later in the chapter. 

Convolution is an operation on functions: it takes two functions and combines 
them to produce a new function. In this book, the convolution operator is denoted 
by a star: the result of applying convolution to the functions / and g is / * g. We 
say that / is convolved with g, and / * g is the convolution of / and g. 

Convolution can be applied either to continuous functions (functions f(x) that 
are defined for any real argument x) or to discrete sequences (functions a[i] that 
are defined only for integer arguments i). It can also be applied to functions de¬ 
fined on one-dimensional, two-dimensional, or higher-dimensional domains (that 
is, functions of one, two, or more arguments). We will start with the discrete, 
one-dimensional case first, then continue to continuous functions and two- and 
three-dimensional functions. 

For convenience in the definitions, we generally assume that the functions’ 
domains go on forever, though of course in practice they will have to stop some¬ 
where, and we have to handle the end points in a special way. 
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9.2.1 Moving Averages 

To get a basic picture of convolution, consider the example of smoothing a ID 
function using a moving average (Figure 9.3). To get a smoothed value at any 
point, we compute the average of the function over a range extending a distance 
r in each direction. The distance r, called the radius of the smoothing operation, 
is a parameter that controls how much smoothing happens. 

We can state this idea mathematically for discrete or continuous functions. If 
we’re smoothing a continuous function g(x), averaging means integrating g over 
an interval and then dividing by the length of the interval: 

1 i-x+r 

Ma;) = tt / g{t) dt. 

On the other hand, if we’re smoothing a discrete function 6[z], averaging means 
summing b for a range of indices and dividing by the number of values: 

i+r 

C M = 27+1 t 9 - 1 ) 

j=i—r 

In each case, the normalization constant is chosen so that if we smooth a constant 
function the result will be the same function. 

This idea of a moving average is the essence of convolution; the only differ¬ 
ence is that in convolution the moving average is a weighted average. 


9.2.2 Discrete Convolution 

We will start with the most concrete case of convolution: convolving a discrete 
sequence a[i\ with another discrete sequence b[i\. The result is a discrete sequence 
(a * &)[*]. The process is just like smoothing b with a moving average, but this 
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a= ■■■001 4641 00 ■■■ x^ 




Figure 9.4. Computing one value in the discrete convolution of a sequence b with a filter a 
that has support five samples wide. Each sample in a * b is an average of nearby samples 
in b, weighted by the values of a. 


time instead of equally weighting all samples within a distance r, we use a second 
sequence a to give a weight to each sample (Figure 9.4). The value a[j] gives the 
weight for a sample that is a distance j from the index i where we are evaluating 
the convolution. Here is the definition of (a * b), expressed as a formula: 


(a-kb)\i)=Y^a\j]b[i-j]. (9.2) 

j 

By omitting bounds on j, we indicate that this sum runs over all integers (that 
is, from —oo to +oo). Figure 9.4 illustrates how one output sample is com¬ 
puted, using the example of a = yg[..., 0,1,4,6,4,1, 0,...] —that is, a[0] = yg, 
a[±l] = yg,etc. 

In graphics, one of the two functions will usually have finite support (as does 
the example in Figure 9.4), which means that it is non-zero only over a finite 
interval of argument values. If we assume that a has finite support, there is some 
radius r such that a[j] = 0 whenever \j\ > r. In that case, we can write the sum 
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above as 

r 

(a*b)[i\ = ^2 a\j]b[i-j\, 
j=—r 

and we can express the definition in code as 


function convolve(sequence a, sequence b, int r, int i ) 

s = 0 

for j = — r to r 
s = s + a [j]b[i - j] 

return s 


1 

2 r+ 1 







r 




Figure 9.5. A discrete box 
filter. 


Convolution Filters 

Convolution is important because we can use it to perform filtering. Looking back 
at our first example of filtering, the moving average, we can now reinterpret that 
smoothing operation as convolution with a particular sequence. When we com¬ 
pute an average over some limited range of indices, that is the same as weighting 
the points in the range all identically and weighting the rest of the points with 
zeros. This kind of filter, which has a constant value over the interval where it is 
non-zero, is known as a box filter (because it looks like a rectangle if you draw its 
graph—see Figure 9.5). For a box filter of radius r the weight is 1/(2r + 1): 


a[j\ 


2r+l 

0 


—r < j < r, 

otherwise. 


If you substitute this filter into Equation (9.2), you will find that it reduces to the 
moving average in Equation (9.1). 

As in this example, convolution filters are usually designed so that they sum 
to 1. That way, they don’t affect the overall level of the signal. 

Example (Convolution of a box and a step). For a simple example of filtering, let 
the signal be the step function 


b\i\ 


1 i > 0, 
0 i < 0, 


and the filter be the five-point box filter centered at zero. 


«[?] = 


1 

5 


1 2 < j < 2, 

0 otherwise. 
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Figure 9.6. Discrete convolution of a box function with a step function. 


What is the result of convolving a and bl At a particular index i, as shown in 
Figure 9.6, the result is the average of the step function over the range from i — 2 
to i + 2. If i < —2, we are averaging all zeros and the result is zero. If i > 2, 
we are averaging all ones and the result is one. In between there are i + 3 ones, 
resulting in the value . The output is a linear ramp that goes from 0 to 1 over 
five samples: ^[..., 0,0, 1,2, 3,4, 5,5,.. .]. 

Properties of Convolution 

The way we’ve written it so far, convolution seems like an asymmetric operation: 
b is the sequence we’re smoothing, and a provides the weights. But one of the nice 
properties of convolution is that it actually doesn’t make any difference which is 
which: the filter and the signal are interchangeable. To see this, just rethink the 
sum in Equation (9.2) with the indices counting from the origin of the sequence 
b, rather than from the origin of a where we are computing the value. That is, we 
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Figure 9.7. The discrete 
identity filter. 


replace j with i — k. The result of this change of variable is 

(a * b) [*] = a[i — k\b[i — (i — fc)] 

k 

= 6[£:]a[* — k\. 

k 

This is exactly the same as Equation (9.2) but with b acting as the filter and a 
acting as the signal. So for any sequences a and b, (a-kb) = (bk a), and we say 
that convolution is a commutative operation. 1 

More generally, convolution is a “multiplication-like” operation. Like multi¬ 
plication or addition of numbers or functions, neither the order of the arguments 
nor the placement of parentheses affects the result. Also, convolution relates to 
addition in the same way that multiplication does. To be precise, convolution is 
commutative and associative , and it is distributive over addition. 

commutative: (a k b) [*] = (bk a )[*] 

associative: (ak (bk c)) [*] = ((a kb) * c) [i] 

distributive: (a * (b + c)) [i] = (a kb + ak c) [*] 

These properties are very natural if we think of convolution as being like multi¬ 
plication, and they are very handy to know about because they can help us save 
work by simplifying convolutions before we actually compute them. For instance, 
suppose we want to take a sequence b and convolve it with three filters, a\, a 2, 
and 03 — that is, we want < 3,3 * (<22 * (ai * b)). If the sequence is long and the 
filters are short (that is, they have small radii), it is much faster to first convolve 
the three filters together (computing ai *<22 *03) and finally to convolve the result 
with the signal, computing (a\ k 02 * <23) k b, which we know from commutativity 
and associativity gives the same result. 

A very simple filter serves as an identity for discrete convolution: it is the 
discrete filter of radius zero, or the sequence d[i\ = ..., 0,0,1,0,0,... (Figure 
9.7). If we convolve d with a signal b, there will be only one non-zero term in the 
sum: 

j=0 

(dkb)[i\ = ^2d{j]b[i-j] 
j=o 

= &[*]• 

1 You may have noticed that one of the functions in the convolution sum seems to be flipped over— 
that is, a[j] gives the weight for the sample j units earlier in the sequence, while a[—j] gives the 
weight for the sample j units later in the sequence. The reason for this has to do with ensuring 
associativity; see Exercise 4. Most of the filters we use are symmetric, so you hardly ever need to 
worry about this. 
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So clearly, convolving b with d just gives back b again. The sequence d is known 
as the discrete impluse. It is occasionally useful in expressing a filter: for instance, 
the process of smoothing a signal b with a filter a and then subtracting that from 
the original could be expressed as a single convolution with the filter d — a: 

c = b — a*b = d*b — a-kb = (d — a) * b. 


9.2.3 Convolution as a Sum of Shifted Filters 

There is a second, entirely equivalent, way of interpreting Equation (9.2). Look¬ 
ing at the samples of a-kb one at a time leads to the weighted-average interpretation 
that we have already seen. But if we omit the [*], we can instead think of the sum 
as adding together entire sequences. One piece of notation is required to make 
this work: if b is a sequence, then the same sequence shifted to the right by j 
places is called b^j (Figure 9.8): 

b^j[i] = b[i-j}. 

Then, we can write Equation (9.2) as a statement about the whole sequence (a-kb) 
rather than element-by-element: 

(a-kb) =J2a{j]b^j. 
i 



Looking at it this way, the convolution is a sum of shifted copies of b, weighted 
by the entries of a (Figure 9.9). Because of commutativity, we can pick either a 



Figure 9.9. Discrete convolution as a sum of shifted copies of the filter. 
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or b as the filter; if we choose b, then we are adding up one copy of the filter for 
every sample in the input. 

9.2.4 Convolution with Continuous Functions 

While it is true that discrete sequences are what we actually work with in a com¬ 
puter program, these sampled sequences are supposed to represent continuous 
functions, and often we need to reason mathematically about the continuous func¬ 
tions in order to figure out what to do. For this reason it is useful to define con¬ 
volution between continuous functions and also between continuous and discrete 
functions. 

The convolution of two continuous functions is the obvious generalization of 
Equation (9.2), with an integral replacing the sum; 



(9.3) 


One way of interpreting this definition is that the convolution of / and g, evaluated 
at the argument x, is the area under the curve of the product of the two functions 
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Figure 9.10. Continuous convolution. 
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after we shift / so that /(0) lines up with g(x). Just like in the discrete case, 
the convolution is a moving average, with the filter providing the weights for the 
average (See Figure 9.10). 

Like discrete convolution, convolution of continuous functions is commuta¬ 
tive and associative, and it is distributive over addition. Also as with the discrete 
case, the continuous convolution can be seen as a sum of copies of the filter rather 
than the computation of weighted averages. Except, in this case, there are in¬ 
finitely many copies of the filter: 

/ +oo 

f(t)g->t dt. 

-oo 


Example (Convolution of two box functions). Let / be a box function: 

f(x) = i 1 X < ^ 

10 otherwise. 

Then what is / * /? The definition (Equation (9.3)) gives 

/ OO 

f(t)f{x - t ) dt. 

-OO 

Figure 9.11 shows the two cases of this integral. The two boxes might have zero 
overlap, which happens when x < -1 or s > 1; in this case the result is zero. 
When — 1 < x < 1, the overlap depends on the separation between the two boxes, 



Figure 9.11 . Convolving two boxes yields a tent function. 
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which is |cc|; the result is 1 — |x|. So 

(/*/)(*) = j ^ 1 

This function, known as the tent function, is another common filter (see Sec¬ 
tion 9.3.1). 


— 1 < x < 1, 

otherwise. 


m 


0 

x->- 


Figure 9.12. The Dirac 
delta function S(x). 


The Dirac Delta Function 


In discrete convolution, we saw that the discrete impulse d acted as an identity: 
d-k a = a. In the continuous case, there is also an identity function, called the 
Dirac impulse or Dirac delta function, denoted 8(x). 

Intuitively, the delta function is a very narrow, very tall spike that has infinites¬ 
imal width but still has area equal to 1 (Figure 9.12). The key defining property of 
the delta function is that multiplying it by a function selects out the value exactly 
at zero: 

/»00 

/ S(x)f(x)dx = /(0). 


The delta function does not have a well-defined value at 0 (you can think of its 
value loosely as +oo), but it does have the value S(x) = 0 for all x ^ 0. 

From this property of selecting out single values, it follows that the delta func¬ 
tion is the identity for continuous convolution (Figure 9.13). The convolution of 



Figure 9.13. Convolving a function with 5(x) returns a copy of the same function. 
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8 with a function / is 

/ OO 

8(t)f{x - t)dt = f(x). 

-OO 

So 8*f = f. 

9.2.5 Discrete-Continuous Convolution 

There are two ways to connect the discrete and continuous worlds. One is sam¬ 
pling: we convert a continuous function into a discrete one by writing down the 
function’s value at all integer arguments and forgetting about the rest. Given a 
continuous function f(x), we can sample it to convert to a discrete sequence a[i]: 

a[i\ = f(i). 

Going the other way, from a discrete function, or sequence, to a continuous func¬ 
tion, is called reconstruction. This is accomplished using yet another form of 
convolution, the discrete-continuous form. In this case, we are filtering a discrete 
sequence a[i\ with a continuous filter f(x): 

(a*f)(x) = 5Z°W/( x ~ i )- 

i 

The value of the reconstructed function a*/ at a; is a weighted sum of the samples 
a[i\ for values of * near a; (Figure 9.14). The weights come from the filter /, which 
is evaluated at a set of points spaced one unit apart. For example, if x = 5.3 and 



Figure 9.14. Discrete-continuous convolution. 
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/ has radius 2, / is evaluated at 1.3,0.3, —0.7, and —1.7. Note that for discrete- 
continuous convolution we generally write the sequence first and the filter second, 
so that the sum is over integers. 

As with discrete convolution, we can put bounds on the sum if we know the 
filter’s radius, r, eliminating all points where the difference between x and i is at 
least r: 

[x+rj 

(a*f)(x)= a\i]f(x-i). 

i= \x—r\ 

Note, that if a point falls exactly at distance r from x (i.e., if a; — r turns out to be 
an integer), it will be left out of the sum. This is in contrast to the discrete case, 
where we included the point at * — r. 

Expressed in code, this is: 
function reconstruct(sequence a, filter /, real x) 
s = 0 

r = /.radius 

for i = \x — r] to \x + rj do 
s = s + a[i\f(x — i) 

return s 

As with the other forms of convolution, discrete-continuous convolution may 
be seen as summing shifted copies of the filter (Figure 9.15): 

(a*/) = Yl a $f^ i - 

i 

Discrete-continuous convolution is closely related to splines. For uniform 
splines (a uniform B-spline, for instance), the parameterized curve for the spline 



Figure 9.15. Reconstruction (discrete-continuous convolution) as a sum of shifted copies 
of the filter. 
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is exactly the convolution of the spline’s basis function with the control point 
sequence (see Section 15.6.2). 

9.2.6 Convolution in More Than One Dimension 

So far, everything we have said about sampling and reconstruction has been one¬ 
dimensional: there has been a single variable x or a single sequence index i. 
Many of the important applications of sampling and reconstruction in graphics, 
though, are applied to two-dimensional functions—in particular, to 2D images. 
Fortunately, the generalization of sampling algorithms and theory from ID to 2D, 
3D, and beyond is conceptually very simple. 

Beginning with the definition of discrete convolution, we can generalize it to 
two dimensions by making the sum into a double sum: 

(a*b)[i,j] = ~ *'>■?-/]• 

v j’ 

If a is a finitely supported filter of radius r (that is, it has (2r + l) 2 values), 
then we can write this sum with bounds (Figure 9.16): 

i' —r j —r 

(a*b)[i,j]= Y ~ j'\ 


( 

aft.-t] 

40,-1] 

v 4-l,-l] 


,41,0] ^ 

n 40,0] ^ 

n 4-i,0] 

V 

* Q 

J* U] c 

V c 

40.1] 

N 4-i,i] 






\ 

i 


Figure 9.1 6 . The weights for the nine input samples that contribute to the discrete convolu¬ 
tion at point (/, j) with a filter a of radius 1. 
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f(-x', - y')dx'dy ' 



i— x '—i 


Figure 9.17. The weight 
for an infinitesimal area in 
the input signal resulting 
from continuous convolu¬ 
tion at (x, y). 


and express it in code: 

function convolve2d(sequence2d a, sequence2d b, int i, int j) 
s = 0 

r = a.radius 
for i ' = — r to r do 
for j' = —r to r do 

s = 8 + a\i']\j'}b[i-i'}\j-j'} 

return s 

This definition can be interpreted in the same way as in the ID case: each 
output sample is a weighted average of an area in the input, using the 2D filter as 
a “mask” to determine the weight of each sample in the average. 

Continuing the generalization, we can write continuous-continuous (Figure 
9.17) and discrete-continuous (Figure 9.18) convolutions in 2D as well: 

if*g){x,y)=J J f{x',y')g{x-x',y-y')dx'dy'\ 

(■a * f)(x, y) = ^ a [*’7']/(£ ~i,y~ j)- 

i 3 

In each case, the result at a particular point is a weighted average of the input near 
that point. For the continuous-continuous case, it is a weighted integral over a 
region centered at that point, and in the discrete-continuous case it is a weighted 
average of all the samples that fall near the point. 
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Once we have gone from ID to 2D, it should be fairly clear how to generalize 
further to 3D or even to higher dimensions. 


9.3 Convolution Filters 

Now that we have the machinery of convolution, let’s examine some of the par¬ 
ticular filters commonly used in graphics. 


9.3.1 A Gallery of Convolution Filters 


The Box Filter 

The box filter (Figure 9.19) is a piecewise constant function whose integral is 
equal to one. As a discrete filter, it can be written as 


r^box. r [f] 


l/(2r- + 1) 1*1 < r, 

0 otherwise. 


Note that for symmetry we include both endpoints. 
As a continuous filter, we write 


/box,r (■£) 


1/(27") — r < x < r, 

0 otherwise. 


In this case, we exclude one endpoint which makes the box of radius 0.5 usable 
as a reconstruction filter. It is because the box filter is discontinuous that these 
boundary cases are important, and so for this particular filter, we need to pay 
attention to them. We write just /b ox for the common case of r = /. 


The Tent Filter 

The tent, or linear filter (Figure 9.20) is a continuous, piecewise linear function: 


/tent Or) — 


11 - |*| M < i, 

10 otherwise; 

= /tent jx/r) 


1 




2r+ 1 






1 

r 

, 

r 

2 r 



-r 


r 


Figure 9.19. The discrete 
and continuous box filters. 



Figure 9.20. The tent filter 
and two scaled versions. 


For filters that are at least C° (that is, there are no sudden jumps in the value, 
as there are with the box), we no longer need to separate the definitions of the 
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Figure 9.21. The 

Gaussian filter. 



x-»- 


Figure 9.22. The B-spline 
filter. 
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Figure 9.23. The Catmull- 
Rom filter. 


discrete and continuous filters: the discrete filter is just the continuous filter sam¬ 
pled at the integers. Also note that for simplicity we define / tent , r by scaling the 
“standard size” tent filter / tent . From now on, we’ll take this scaling for granted: 
once we define a filter /, then we can use f r to mean “the filter / stretched out 
by r and also scaled down by r.” Note that f r has the same integral as /, and we 
will always make sure that the value of the integral is equal to 1.0. 

The Gaussian Filter 

The Gaussian function (Figure 9.21), also known as the normal distribution, is 
an important filter theoretically and practically. Well see more of its special 
properties as the chapter goes on: 



The Gaussian does not have finite support, although because of the exponential 
decay, its values rapidly become small enough to ignore. When necessary, then, 
we can trim the tails from the function by setting it to zero outside some radius. 
The Gaussian makes a good sampling filter because it is very smooth; we’ll make 
this statement more precise later in the chapter. 

The B-spline Cubic Filter 

Many filters are defined as piecewise polynomials, and cubic filters with four 
pieces are often used as reconstruction filters. One such filter is known as the B- 
spline filter (Figure 9.22) because of its origins as a blending function for spline 
curves (see Chapter 15): 



Among piecewise cubics, the B-spline is special because it has continuous first 
and second derivatives—that is, it is C 2 . A more concise way of defining this 
filter is F B = /box * /box * /box * /boxi proving that the longer form above is 
equivalent is a nice exercise in convolution (see Exercise 3). 

The Catmull-Rom Cubic Filter 

Another piecewise cubic filter named for a spline, the Catmull-Rom filter (Figure 
9.23), has the value zero ata; = —2,—1,1, and 2, which means it will interpolate 



Figure 9.21. The 

Gaussian filter. 
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Figure 9.22. The B-spline 
filter. 
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Figure 9.23. The Catmull- 
Rom filter. 
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the samples when used as a reconstruction filter (Section 9.3.2): 


fc{x ) = ^ < 


(2 - 1 * 1 ) 3 -( 2 - M ) 2 

0 


-1 < X < 1 , 

1 < |*| < 2 , 

otherwise. 


The Mitchell-Netravali Cubic Filter 

For the all-important application of resampling images, Mitchell and Netravali 
(Mitchell & Netravali, 1988) made a study of cubic filters and recommended one 
part way between the previous two filters as the best all-around choice (Figure 
9.24). It is simply a weighted combination of the previous two filters: 


/m{x) 


\fB(x) + | f C (x ) 

f 21(1 - |*|) 3 + 27(1 -|*|) 2 + 9(1- |*|) + 1 -1<*<1, 
Yg t 7(2 — |*|) 3 — 6(2 — |*|) 2 1<|*|<2, 

I 0 otherwise. 


1 H 

8 

x 

-2 -1 

X 

i 2 


Figure 9.24. The Mitchell- 
Netravali filter. 


9.3.2 Properties of Filters 


Filters have some traditional terminology that goes with them, which we use to 
describe the filters and compare them to one another. 


The impulse response of a filter is just another name for the function: it is 
the response of the filter to a signal that just contains an impluse (and recall that 
convolving with an impulse just gives back the filter). 


A continuous filter is interpolat¬ 
ing if, when it is used to reconstruct 
a continuous function from a dis¬ 
crete sequence, the resulting func¬ 
tion takes on exactly the values of 
the samples at the sample points— 
that is, it “connects the dots” rather 
than producing a function that only 
goes near the dots. Interpolating fil¬ 
ters are exactly those filters / for 
which /(0) = 1 and f(i) = 0 for all non-zero integers i (Figure 9.25). 



Figure 9.25. An interpolating filter reconstructs 
the sample points exactly because it has the 
value zero at all non-zero integer offsets from 
the center. 
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A filter that takes on negative 
values has ringing or overshoot: it 
will produce extra oscillations in the 
value around sharp changes in the 
value of the function being filtered. 

For instance, the Catmull-Rom Figure 9 - 26 - A filter with negative lobes will 
, , , . . always produce some overshoot when filtering 

filtei has negative lobes on eithei or reconstructing a sharp discontinuity. 

side, and if you filter a step function 

with it, it will exaggerate the step a bit, resulting in function values that under¬ 
shoot 0 and overshoot 1 (Figure 9.26). 

A continuous filter is ripple free if, when used as a reconstruction filter, it 
will reconstruct a constant sequence as a constant function (Figure 9.27). This is 
equivalent to the requirement that the filter sum to one on any integer-spaced grid: 

f(x + i) = 1 for all x. 

i 

A continuous filter has a degree of continuity, which is the highest-order 
derivative that is defined everywhere. A filter, like the box filter, that has sud¬ 
den jumps in its value is not continuous at all. A filter that is continuous but 
has sharp corners (discontinuities in the first derivative), such as the tent filter, 
has order of continuity zero, and we say it is C°. A filter that has a continuous 
derivative (no sharp corners), such as the piecewise cubic filters in the previous 
section, is C 1 ; if its second derivative is also continuous, as is true of the B-spline 
filter, it is C 2 . The order of continuity of a filter is particularly important for a 
reconstruction filter because the reconstructed function inherits the continuity of 
the filter. 


overshoot 


overshoot 



Figure 9.27. The tent filter of radius 1 is a ripple-free reconstruction filter; the Gaussian 
filter with standard deviation 1/2 is not. 
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Separable Filters 

So far we have only discussed filters for ID convolution, but for images and other 
multidimensional signals we need filters too. In general, any 2D function could 
be a 2D biter, and occasionally it is useful to dehne them this way. But, in most 
cases, we can build suitable 2D (or higher-dimensional) biters from the ID biters 
we have already seen. 

The most useful way of doing this is by using a separable biter. The value of 
a separable biter f%{x, y) at a particular x and y is simply the product of fi (the 
ID biter) evaluated at x and at y: 

f 2 (x,y) = fi(x)fi(y). 


Similarly, for discrete biters. 


02 [i,j] = oi[t]ai[j]. 

Any horizontal or vertical slice through is a scaled copy of f\. The integral of 
/2 is the square of the integral of fi, so in particular if /i is normalized, then so 

is fi- 


Example (The separable tent filter). If we choose the tent function for /i, the re¬ 
sulting piecewise bilinear function (Figure 9.28) is 

, , f(l - M)(l - \y\) M<1 and \y\ < 1, 

/2,tent(x,2/) = < . 

I 0 otherwise. 

The probles along the coordinate axes are tent functions, but the probles along 
the diagonals are quadratics (for instance, along the line x = y in the positive 
quadrant, we see the quadratic function (1 — x) 2 ). 



Figure 9.28. The separable 2D tent filter. 


208 


9. Signal Processing 



Figure 9.29. The 2D Gaussian filter, which is both separable and radially symmetric. 

Example (The 2D Gaussian filter). If we choose the Gaussian function for f \, the 
resulting 2D function (Figure 9.29) is 





Notice that this is (up to a scale factor) the same function we would get if we 
revolved the ID Gaussian around the origin to produce a circularly symmetric 
function. The property of being both circularly symmetric and separable at the 
same time is unique to the Gaussian function. The profiles along the coordinate 
axes are Gaussians, but so are the profiles along any direction at any offset from 
the center. 

The key advantage of separable filters over other 2D filters has to do with ef¬ 
ficiency in implementation. Let’s substitute the definition of a -2 into the definition 
of discrete convolution: 


(a 2 *b)[i,j] = - i' ,j - j'}. 


Note that a i [i'\ does not depend on j' and can be factored out of the inner sum: 




i' j' 
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Let’s abbreviate the inner sum as S[k]: 

S[k] = ^ m [/]&[&, j -/]; 

o' 

(a 2 *b)[i,j] = - i']. (9.4) 

i' 


With the equation in this form, we can first compute and store S[i — i'] for each 
value of i', and then compute the outer sum using these stored values. At first 
glance this does not seem remarkable, since we still had to do work proportional 
to (2r + l) 2 to compute all the inner sums. However, it’s quite different if we 
want to compute the value at many points [i,j]. 

Suppose we need to compute 02 * b at [2, 2] and [3,2], and a± has a radius 
of 2. Examining Equation (9.4), we can see that we will need S[0],..., S[4] to 
compute the result at [2, 2], and we will need S[l],..., S[5] to compute the result 
at [3, 2]. So, in the separable formulation, we can just compute all six values of S 
and share S[l],..., 5[4] (Figure 9.30). 

This savings has great significance for large filters. Filtering an m by n 2D 
image with a filter of radius r in the general case requires computation of (2r+l) 2 
products per pixel, while filtering the image with a separable filter of the same size 
requires 2(2r + 1) products (at the expense of some intermediate storage). This 
change in asymptotic complexity from 0(r 2 ) to O(r) enables the use of much 
larger filters. 

The algorithm is: 

function filterlmage(image /, filter /) 

r = /.radius 
n x = /.width 
n y = /.height 

allocate storage array S[0,..., n x — 1] 
allocate image / out [r,..., n x - r - 1] [r,..., n y - r - 1] 
initialize S and / out to all zero 
for y = r to n y — r — 1 do 
for x = 0 to n x — 1 do 

5[x] = 0 

for i = — r to r do 

S[x] = S[x] + f[i]I[x\[y - i] 
for x = r to n x — r — 1 do 

for i = —r to r do 

/out Mb] = /out Mb] + f[i\s[x - i] 

return / out 



Figure 9.30. Com¬ 

puting two output points 
using separate 2D arrays 
of 25 samples (above) vs. 
filtering once along the 
columns, then using sepa¬ 
rate 1D arrays of five sam¬ 
ples (below). 
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For simplicity, this function avoids all questions of boundaries by trimming r 
pixels off all four sides of the output image. In practice there are various ways to 
handle the boundaries; see Section 9.4.3. 


9.4 Signal Processing for Images 

We have discussed sampling, filtering, and reconstruction in the abstract so far, 
using mostly ID signals for examples. But as we observed at the beginning of the 
chapter, the most important and most common application of signal processing in 
graphics is for sampled images. Let us look carefully at how all this applies to 
images. 

9.4.1 Image Filtering Using Discrete Filters 

Perhaps the simplest application of convolution is processing images using dis¬ 
crete convolution. Some of the most widely used features of image manipulation 
programs are simple convolution filters. Blurring of images can be achieved by 
convolving with many common lowpass filters, ranging from the box to the Gaus- 



Figure 9.31. Blurring an image by convolution with each of three different filters. 
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Figure 9.32. Sharpening an image using a convolution filter. 

sian (Figure 9.31). A Gaussian filter creates a very smooth-looking blur and is 
commonly used for this purpose. 

The opposite of blurring is sharpening, and one way to do this is by using 
the “unsharp mask” procedure: subtract a fraction a of a blurred image from the 
original. With a rescaling to avoid changing the overall brightness, we have 


Tsharp — (14 - rr)T g,a * I) 

= ((1 + a)d - Otfg,a) *1 
,/sharp (c? Ct) * 1 1 


where f, ha is the Gaussian filter of width er. Using the discrete impluse d and 
the distributive property of convolution, we were able to write this whole process 
as a single filter that depends on both the width of the blur and the degree of 
sharpening (Figure 9.32). 

Another example of combining two discrete filters is a drop shadow. It’s com¬ 
mon to take a blurred, shifted copy of an object’s outline to create a soft drop 
shadow (Figure 9.33). We can express the shifting operation as convolution with 
an off-center impulse: 



Figure 9.33. A soft drop 
shadow. 



1 i = m and j = n , 
0 otherwise. 


Shifting, then blurring, is achieved by convolving with both filters: 

-^shadow — fg.rj * ( dr n .ri * T) 

— (/p,cr * * T 

./'shadow (m. 71. (j) sAr I. 


Here we have used associativity to group the two operations into a single filter 
with three parameters. 
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Figure 9.34. Two artifacts of aliasing in images: moire patterns in periodic textures (left), 
and “jaggies” on straight lines (right). 


9.4.2 Antialiasing in Image Sampling 

In image synthesis, we often have the task of producing a sampled representation 
of an image for which we have a continuous mathematical formula (or at least a 
procedure we can use to compute the color at any point, not just at integer pixel 
positions). Ray tracing is a common example (see Chapter 4). In the language 
of signal processing, we have a continuous 2D signal (the image) that we need to 
sample on a regular 2D lattice. If we go ahead and sample the image without any 
special measures, the result will exhibit various aliasing artifacts (Figure 9.34). At 
sharp edges in the image, we see stair-step artifacts known as “jaggies.” In areas 
where there are repeating patterns, we see wide bands known as moire patterns. 

The problem here is that the image contains too many small-scale features; 
we need to smooth it out by filtering it before sampling. Looking back at the defi¬ 
nition of continuous convolution in Equation (9.3), we need to average the image 
over an area around the pixel location, rather than just taking the value at a single 



Figure 9.35. A comparison of three different sampling filters being used to antialias a 
difficult test image that contains circles that are spaced closer and closer as they get larger. 
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point. The specific methods for doing this are discussed in Chapter 4. A simple 
filter like a box will improve the appearance of sharp edges, but it still produces 
some moire patterns (Figure 9.35). The Gaussian filter, which is very smooth, is 
much more effective against the moire patterns, at the expense of overall some¬ 
what more blurring. These two examples illustrate the tradeoff between sharpness 
and aliasing that is fundamental to choosing antialiasing filters. 

9.4.3 Reconstruction and Resampling 

One of the most common image operations where careful filtering is crucial is 
resampling —changing the sample rate, or changing the image size. 

Suppose we have taken an image with a digital camera that is 3000 by 2000 
pixels in size, and we want to display it on a monitor that has only 1280 by 1024 
pixels. In order to make it fit, while maintaining the 3:2 aspect ratio, we need to 
resample it to 1278 by 852 pixels. How should we go about this? 



Figure 9.36. Resampling an image consists of two logical steps that are combined into a 
single operation in code. First, we use a reconstruction filter to define a smooth, continuous 
function from the input samples. Then, we sample that function on a new grid to get the 
output samples. 
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One way to approach this problem is to think of the process as dropping pixels: 
the size ratio is between 2 and 3, so we’ll have to drop out one or two pixels 
between pixels that we keep. It’s possible to shrink an image in this way, but 
the quality of the result is low—the images in Figure 9.34 were made using pixel 
dropping. Pixel dropping is very fast, however, and it is a reasonable choice to 
make a preview of the resized image during an interactive manipulation. 

The way to think about resizing images is as a resampling operation: we 
want a set of samples of the image on a particular grid that is defined by the new 
image dimensions, and we get them by sampling a continuous function that is 
reconstructed from the input samples (Figure 9.36). Looking at it this way, it’s 
just a sequence of standard image processing operations: first we reconstruct a 
continuous function from the input samples, and then we sample that function 
just as we would sample any other continuous image. To avoid aliasing artifacts, 
appropriate filters need to be used at each stage. 

A small example is shown in 
Figure 9.37: if the original im¬ 
age is 12 x 9 pixels and the new 
one is 8 x 6 pixels, there are 
2/3 as many output pixels as in¬ 
put pixels in each dimension, so 
their spacing across the image is 
3/2 the spacing of the original 
samples. 

In order to come up with 
a value for each of the output 
samples, we need to somehow 
compute values for the image in 
between the samples. The pixel¬ 
dropping algorithm gives us one 
way to do this: just take the 
value of the closest sample in 
the input image and make that the output value. This is exactly equivalent to 
reconstructing the image with a 1-pixel-wide box filter and then point sampling. 

Of course, if the main reason for choosing pixel dropping or other very sim¬ 
ple filtering is performance, one would never implement that method as a special 
case of the general reconstruction-and-resampling procedure. In fact, because 
of the discontinuities, it’s difficult to make box filters work in a general frame¬ 
work. But, for high-quality resampling, the reconstruction/sampling framework 
provides valuable flexibility. 
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Figure 9.37. The sample locations for the input and 
output grids in resampling a 12 by 9 image to make 
an 8 by 6 one. 
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To work out the algorithmic details it’s simplest to drop down to ID and dis¬ 
cuss resampling a sequence. The simplest way to write an implementation is in 
terms of the reconstruct function we defined in Section 9.2.5. 

function resample(sequence a, float xo, float Ax, int n, filter /) 
create sequence b of length n 
for i = 0 to n — 1 do 

b[i] = reconstruct^, /, Xo + iAx) 

return b 

The parameter xo gives the position of the first sample of the new sequence in 
terms of the samples of the old sequence. That is, if the first output sample falls 
midway between samples 3 and 4 in the input sequence, xo is 3.5. 

This procedure reconstructs a continuous image by convolving the input se¬ 
quence with a continuous filter and then point samples it. That’s not to say that 
these two operations happen sequentially—the continuous function exists only in 
principle and its values are computed only at the sample points. But mathemati¬ 
cally, this function computes a set of point samples of the function a * /. 

This point sampling seems wrong, though, because we just finished saying 
that a signal should be sampled with an appropriate smoothing filter to avoid 
aliasing. We should be convolving the reconstructed function with a sampling 
filter g and point sampling <7 * (/ * a). But since this is the same as (<?*/)* a, 
we can roll the sampling filter together with the reconstruction filter; one convo¬ 
lution operation is all we need (Figure 9.38). This combined reconstruction and 
sampling filter is known as a resampling filter. 



Figure 9.38. Resampling involves filtering for reconstruction and for sampling. Since two 
convolution filters applied in sequence can be replaced with a single filter, we only need one 
resampling filter, which serves the roles of reconstruction and sampling. 
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When resampling images, we usually specify a source rectangle in the units of 
the old image that specifies the part we want to keep in the new image. For exam¬ 
ple, using the pixel sample positioning convention from Chapter 3, the rectangle 
we’d use to resample the entire image is (—0.5, n° ld — 0.5) x (—0.5, ro° ld — 0.5). 
Given a source rectangle ( Xi,Xh ) x ( yi , y^), the sample spacing for the new im¬ 
age is Ax = (xh — £z)/n" ew in x and Ay = (yh — yi)/n r ^ v ' in y. The lower-left 
sample is positioned at (xi + Ax/2, yi + Ay/2). 

Modifying the ID pseudocode to use this convention, and expanding the call 
to the reconstruct function into the double loop that is implied, we arrive at: 

function resample(sequence a, float xi, float Xh, int n, filter /) 
create sequence b of length n 
r = /.radius 
= xi + Ax/2 
for i = 0 to n — 1 do 
s = 0 

x = Xo + iAx 

for j = \x — r] to [x + r\ do 
s = s + a[j]f{x — j) 
b[i] = s 

return b 

This routine contains all the basics of resampling an image. One last issue that 
remains to be addressed is what to do at the edges of the image, where the simple 
version here will access beyond the bounds of the input sequence. There are 
several things we might do: 

• Just stop the loop at the ends of the sequence. This is equivalent to padding 
the image with zeros on all sides. 

• Clip all array accesses to the end of the sequence — that is, return a[0] when 
we would want to access a[—1]. This is equivalent to padding the edges of 
the image by extending the last row or column. 

• Modify the filter as we approach the edge so that it does not extend beyond 
the bounds of the sequence. 

The first option leads to dim edges when we resample the whole image, which 
is not really satisfactory. The second option is easy to implement; the third is 
probably the best performing. The simplest way to modify the filter near the edge 
of the image is to renormalize it: divide the filter by the sum of the part of the filter 
that falls within the image. This way, the filter always adds up to 1 over the actual 
image samples, so it preserves image intensity. For performance, it is desirable 
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Figure 9.39. The effects of using different sizes of a filter for upsampling (enlarging) or 
downsampling (reducing) an image. 


to handle the band of pixels within a filter radius of the edge (which require this 
renormalization) separately from the center (which contains many more pixels 
and does not require renormalization). 

The choice of filter for resampling is important. There are two separate issues: 
the shape of the filter and the size (radius). Because the filter serves both as a 
reconstruction filter and a sampling filter, the requirements of both roles affect 
the choice of filter. For reconstruction, we would like a filter smooth enough to 
avoid aliasing artifacts when we enlarge the image, and the filter should be ripple- 
free. For sampling, the filter should be large enough to avoid undersampling and 
smooth enough to avoid moire artifacts. Figure 9.39 illustrates these two different 
needs. 

Generally we will choose one filter shape and scale it according to the relative 
resolutions of the input and output. The lower of the two resolutions determines 
the size of the filter: when the output is more coarsely sampled than the input 
(downsampling, or shrinking the image), the smoothing required for proper sam¬ 
pling is greater than the smoothing required for reconstruction, so we size the fil¬ 
ter according to the output sample spacing (radius 3 in Figure 9.39). On the other 
hand, when the output is more finely sampled (upsampling, or enlarging the im¬ 
age) then the smoothing required for reconstruction dominates (the reconstructed 
function is already smooth enough to sample at a higher rate than it started), 
so the size of the filter is determined by the input sample spacing (radius 1 in 
Figure 9.39). 

Choosing the filter itself is a tradeoff between speed and quality. Common 
choices are the box filter (when speed is paramount), the tent filter (moderate 
quality), or a piecewise cubic (excellent quality). In the piecewise cubic case, the 
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Figure 9.40. Resampling an image using a separable approach. 


degree of smoothing can be adjusted by interpolating between fg and fc ; the 
Mitchell-Netravali filter is a good choice. 

Just as with image filtering, separable filters can provide a significant speed¬ 
up. The basic idea is to resample all the rows first, producing an image with 
changed width but not height, then to resample the columns of that image to 
produce the final result (Figure 9.40). Modifying the pseudocode given earlier so 
that it takes advantage of this optimization is reasonably straightforward. 


9.5 Sampling Theory 

If you are only interested in implementation, you can stop reading here; the al¬ 
gorithms and recommendations in the previous sections will let you implement 
programs that perform sampling and reconstruction and achieve excellent results. 
However, there is a deeper mathematical theory of sampling with a history reach¬ 
ing back to the first uses of sampled representations in telecommunications. Sam¬ 
pling theory answers many questions that are difficult to answer with reasoning 
based strictly on scale arguments. 

But most important, sampling theory gives valuable insight into the workings 
of sampling and reconstruction. It gives the student who learns it an extra set of 
intellectual tools for reasoning about how to achieve the best results with the most 
efficient code. 
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9.5.1 The Fourier Transform 

The Fourier transform, along with convolution, is the main mathematical concept 
that underlies sampling theory. You can read about the Fourier transform in many 
math books on analysis, as well as in books on signal processing. 

The basic idea behind the Fourier transform is to express any function by 
adding together sine waves (sinusoids) of all frequencies. By using the appropri¬ 
ate weights for the different frequencies, we can arrange for the sinusoids to add 
up to any (reasonable) function we want. 

As an example, the square wave in Figure 9.41 can be expressed by a sequence 
of sine waves: 

oo 4 

— sin27m:r. 

7rn 

This Fourier series starts with a sine wave (sin 2nx) that has frequency 1.0—same 
as the square wave —and the remaining terms add smaller and smaller corrections 
to reduce the ripples and, in the limit, reproduce the square wave exactly. Note 
that all the terms in the sum have frequencies that are integer multiples of the 
frequency of the square wave. This is because other frequencies would produce 
results that don’t have the same period as the square wave. 

A surprising fact is that a signal does not have to be periodic in order to be 
expressed as a sum of sinusoids in this way: a non-periodic signal just requires 
more sinusoids. Rather than summing over a discrete sequence of sinusoids, we 
will instead integrate over a continuous family of sinusoids. For instance, a box 
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Figure 9.42. Approximating a box function with integrals of cosines up to each of four cutoff 
frequencies. 


function can be written as the integral of a family of cosine waves: 

sin7TM 

-cos27 Tuxdu. (9.5) 

7 TU 

This integral in Equation (9.5) is adding up infinitely many cosines, weighting 
the cosine of frequency u by the weight (sin ttu)/ttu. The result, as we include 
higher and higher frequencies, converges to the box function (see Figure 9.42). 
When a function / is expressed in this way, this weight, which is a function of the 
frequency u, is called the Fourier transform of /, denoted /. The function / tells 
us how to build / by integrating over a family of sinusoids: 

/ OO 

f (u)e 2niux du. (9.6) 

-OO 

Equation (9.6) is known as the inverse Fourier transform (IFT) because it starts 
with the Fourier transform of / and ends up with / r 

Note that in Equation (9.6) the complex exponential e 2mux has been substi¬ 
tuted for the cosine in the previous equation. Also, / is a complex-valued func¬ 
tion. The machinery of complex numbers is required to allow the phase, as well 

-Note that the term “Fourier transform” is used both for the function / and for the operation that 
computes / from /. Unfortunately, this rather ambiguous usage is standard. 
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as the frequency, of the sinusoids to be controlled; this is necessary to represent 
any functions that are not symmetric across zero. The magnitude of / is known 
as the Fourier spectrum, and, for our purposes, this is sufficient—we won’t need 
to worry about phase or use any complex numbers directly. 

It turns out that computing / from / looks very much like computing / 


from /: 



Equation (9.7) is known as the (forward) Fourier transform (FT). The sign in 
the exponential is the only difference between the forward and inverse Fourier 
transforms, and it is really just a technical detail. For our purposes, we can think 
of the FT and IFT as the same operation. 

Sometimes the /-/ notation is inconvenient, and then we will denote the 
Fourier transform of / by T{ /} and the inverse Fourier transform of f by T~ 1 { /}. 

A function and its Fourier transform are related in many useful ways. A few 
facts (most of them easy to verify) that we will use later in the chapter are: 

• A function and its Fourier transform have the same squared integral: 



The physical interpretation is that the two have the same energy (Figure 
9.43). 

In particular, scaling a function up by a also scales its Fourier transform by 
a. That is, T{af} = aT{f}. 

• Stretching a function along the at-axis squashes its Fourier transform along 
the u-axis by the same factor (Figure 9.44): 


F{f{x/b)} = bf(bx). 


(The renormalization by b is needed to keep the energy the same.) 

This means that if we are interested in a family of functions of different 
width and height (say all box functions centered at zero), then we only 
need to know the Fourier transform of one canonical function (say the box 
function with width and height equal to one), and we can easily know the 
Fourier transforms of all the scaled and dilated versions of that function. 


f(x) 

nrrj 
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\hu)\ 

* \ 

equal 

energy 

V 

u 


Figure 9.43. The Fourier 
transform preserves the 
squared integral of the 
signal. 
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Figure 9.44. Scaling a signal along the x-axis in the space domain causes an inverse scale 
along the u-axis in the frequency domain. 


For example, we can instantly generalize Equation (9.5) to give the Fourier 
transform of a box of width b and height a: 


ab- 


sin 7 rbu, 
7 rbu 


The average value of / is equal to /(0). This makes sense since /(0) is sup¬ 
posed to be the zero-frequency component of the signal (the DC component 
if we are thinking of an electrical voltage). 

If / is real (which it always is fonts), / is an even function—that is, f{u) = 
Likewise, if / is an even function then / will be real (this is not 
usually the case in our domain, but remember that we really are only going 
to care about the magnitude of /). 


9.5.2 Convolution and the Fourier Transform 

One final property of the Fourier transform that deserves special mention is its 
relationship to convolution (Figure 9.45). Briefly, 


F{f*g} = fg. 
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Figure 9.45. A commutative diagram to show visually the relationship between convolution 
and multiplication. If we multiply f and g in space, then transform to frequency, we end up in 
the same place as if we transformed /and gto frequency and then convolved them. Likewise, 
if we convolve f and g in space and then transform into frequency, we end up in the same 
place as if we transformed fand gto frequency, then multiplied them. 


The Fourier transform of the convolution of two functions is the product of the 
Fourier transforms. Following the by now familiar symmetry, 

f*9 = F{fg}. 

The convolution of two Fourier transforms is the Fourier transform of the product 
of the two functions. These facts are fairly straightforward to derive from the 
definitions. 

This relationship is the main reason Fourier transforms are useful in studying 
the effects of sampling and reconstruction. We’ve seen how sampling, filtering, 
and reconstruction can be seen in terms of convolution; now the Fourier transform 
gives us a new domain—the frequency domain—in which these operations are 
simply products. 


9.5.3 A Gallery of Fourier Transforms 

Now that we have some facts about Fourier transforms, let’s look at some exam¬ 
ples of individual functions. In particular, we’ll look at some filters from Sec¬ 
tion 9.3.1, which are shown with their Fourier transforms in Figure 9.46. We have 
already seen the box function: 

sin 7tu 

•M/box} =-= sine 7 TU. 

7 TU 

The function 3 sin x/x is important enough to have its own name, sine x. 

' You may notice that sin ttu/ttu is undefined for u = 0. It is, however, continuous across zero, 
and we take it as understood that we use the limiting value of this ratio, 1, at u = 0. 
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Figure 9.46. The Fourier transforms of the box, tent, B-spline, and Gaussian filters. 


The tent function is the convolution of the box with itself, so its Fourier trans¬ 
form is just the square of the Fourier transform of the box function: 




Sill 2 7 TU 2 

- = Sine 7 TU. 

7 T Z U Z 


We can continue this process to get the Fourier transform of the B-spline filter 
(see Exercise 3): 


Hfs} = 


• 4 

Sin 7 TU 


= sine nil. 


The Gaussian has a particularly nice Fourier transform: 

Hfc} = e- (2 ™ )2/2 . 


It is another Gaussian! The Gaussian with standard deviation 1.0 becomes a Gaus¬ 
sian with standard deviation 1/27T. 
























9.5. Sampling Theory 


225 


9.5.4 Dirac Impulses in Sampling Theory 

The reason impulses are useful in sampling theory is that we can use them to talk 
about samples in the context of continuous functions and Fourier transforms. We 
represent a sample, which has a position and a value, by an impulse translated 
to that position and scaled by that value. A sample at position a with value b is 
represented by bS(x — a). This way we can express the operation of sampling the 
function f(x) at a as multiplying / by S(x — a). The result is f(a)S(x — a). 

Sampling a function at a series of equally spaced points is therefore expressed 
as multiplying the function by the sum of a series of equally spaced impulses, 
called an impulse train (Figure 9.47). An impulse train with period T, meaning 
that the impulses are spaced a distance T apart is 

OO 

st(x) = 6(x — Ti). 

i =—oo 


The Fourier transform of Si is the same as Si: a sequence of impulses at all 
integer frequencies. You can see why this should be true by thinking about what 
happens when we multiply the impulse train by a sinusoid and integrate. We wind 
up adding up the values of the sinusoid at all the integers. This sum will exactly 
cancel to zero for non-integer frequencies, and it will diverge to +oo for integer 
frequencies. 

Because of the dilation property of the Fourier transform, we can guess that 
the Fourier transform of an impulse train with period T (which is like a dilation 
of si) is an impulse train with period 1/T. Making the sampling finer in the space 
domain makes the impulses farther apart in the frequency domain. 



Figure 9.47. Impulse trains. The Fourier transform of an impulse train is another impulse 
train. Changing the period of the impulse train in space causes an inverse change in the 
period in frequency. 
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9.5.5 Sampling and Aliasing 

Now we have built the mathematical machinery we need to understand the sam¬ 
pling and reconstruction process from the viewpoint of the frequency domian. 
The key advantage of introducing Fourier transforms is that it makes the effects 
of convolution filtering on the signal much clearer, and it provides more precise 
explanations of why we need to filter when sampling and reconstructing. 

We start the process with the original, continuous signal. In general its Fourier 
transform could include components at any frequency, although for most kinds of 
signals (especially images), we expect the content to decrease as the frequency 
gets higher. Images also tend to have a large component at zero frequency — 
remember that the zero-frequency, or DC, component is the integral of the whole 
image, and since images are all positive values this tends to be a large number. 

Let’s see what happens to the Fourier transform if we sample and reconstruct 
without doing any special filtering (Figure 9.48). When we sample the signal, we 
model the operation as multiplication with an impulse train; the sampled signal is 
f st■ Because of the multiplication-convolution property, the FT of the sampled 
signal is/* St = f*s 1 / T . 



Figure 9.48. Sampling and reconstruction with no filtering. Sampling produces alias spectra 
that overlap and mix with the base spectrum. Reconstruction with a box filter collects even 
more information from the alias spectra. The result is a signal that has serious aliasing 
artifacts. 























































Plate I. The RGB color 
cube in 3D and its faces un¬ 
folded. Any RGB color is a 
point in the cube. (See also 
Figure 3.13.) 
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Plate II. A colored triangle 
with barycentric interpola¬ 
tion. Note that the changes 
in color components are lin¬ 
ear in each row and column 
as well as along each edge. 
In fact it is constant along 
every line, such as the di¬ 
agonals, as well. (See also 
Figure 8.5.) 



Plate III. Left: a Phong- 
illuminated image. Middle: 
cool-to-warm shading is not 
useful without silhouettes. 
Right: cool-to-warm shad¬ 
ing plus silhouettes. Image 
courtesy Amy Gooch. (See 
also Figure 10.9.) 








































Plate IV. The color of the 
glass is affected by total in¬ 
ternal reflection and Beer’s 
Law. The amount of light 
transmitted and reflected is 
determined by the Fresnel 
Equations. The complex 
lighting on the ground plane 
was computed using parti¬ 
cle tracing as described in 
Chapter 24. (See also Fig¬ 
ure 13.3.) 



Plate V. An example of 
depth of field. The caus¬ 
tic in the shadow of the 
wine glass is computed us¬ 
ing particle tracing (Chap¬ 
ter 24). (See also Fig¬ 
ure 13.16.) 






Plate VI. “Spiral Stairs.” 
A complex BlobTree im¬ 
plicit model created in Er¬ 
win DeGroot's BlobTree.net 
system. (See also Figure 
16.28.) 



Plate VII. “The Next 
Step.” A complex Blob- 
Tree implicit model cre¬ 
ated interactively in Ryan 
Schmidt’s Shapeshop by 
artist, Corien Clapwijk (An- 
dusan). (See also Figure 
16.31.) 









Plate VIII. Each sphere is 
rendered using only a ver¬ 
tex shader that computes 
Phong shading. Because 
the computation is being 
performed on a per-vertex 
basis, the Phong highlight 
only begins to appear accu¬ 
rate after the amount of ge¬ 
ometry used to model the 
sphere is increased dras¬ 
tically. (See also Figure 
18.7.) 



Plate IX. The results 
of running the fragment 
shader from Section 18.3.4. 
Note that the Phong high¬ 
light does appear on the 
left-most model which is 
represented by a single 
polygon. In fact, be¬ 
cause lighting is calculated 
at the fragment, rather than 
at each vertex, the more 
coarsely tessellated sphere 
models also demonstrate 
appropriate Phong shad¬ 
ing. (See also Figure 18.8.) 











Plate X. The visible spec¬ 
trum. Wavelengths are in 
nanometers. 


400 500 600 700 



Plate XI. HSV color space. 
Hue varies around the cir¬ 
cle, saturation varies with 
radius, and value varies 
with height. 



Plate XII. Which color is 
closer to red: green or vi¬ 
olet? 



Plate XIII. The effect 
shown in Figure 22.29 is 
even more powerful when 
shown in color. Figure cour¬ 
tesy Albert Yonas. 



















Plate XIV. Per-channel 
gamma correction may de- 
saturate the image. The 
left image was desaturated 
with a value of s = 0.5. The 
right image was not desatu- 
rated (s= 1). (See also Fig¬ 
ure 23.11.) 




Plate XV. Image used 
for demonstrating the color 
transfer technique. Re¬ 
sults are shown in Color 
Plates XVI and XVIII. (See 
also Figure 23.12 and Fig¬ 
ure 23.30.) 


Plate XVI. The image on 
the left is used to adjust the 
colors of the image shown 
in Color Plate XV. The re¬ 
sult is shown on the right. 
(See also Figure 23.13.) 





















Plate XVII. Linear interpo¬ 
lation for color correction. 
The parameter c is set to 
0.0 in the left image and to 
1.0 in the right image. (See 
also Figure 23.24.) 



Plate XVIII. The image on 
the left is used to transform 
the image of Color Plate XV 
into a night scene, shown 
here on the right. (See also 
Figure 23.31.) 








Plate XX. Aerial perspec¬ 
tive, in which atmospheric 
effects reduce contrast and 
shift colors towards blue, 
provides a depth cue over 
long distances. 



Plate XXI. A comparison 
between a rendering and 
a photo. Figure courtesy 
Sumant Pattanaik and the 
Cornell Program of Com¬ 
puter Graphics. (See also 
Figure 24.9.) 



Plate XXII. The image 
shows extreme motion blur 
effects. The shadows use 
distribution ray tracing be¬ 
cause they are moving dur¬ 
ing the image. Model by 
Joseph Hamdorf and Young 
Song. Rendering by Eric 
Levin. 















Plate XXIII. Distribution ray-traced images with 1 sample 
per pixel, 16 samples per pixel, and 256 samples per pixel. 
Images courtesy Jason Waltman. 












Plate XXIV. Top: A dif¬ 
fuse shading model is used. 
Bottom: Subsurface scatter¬ 
ing is allowed using a tech¬ 
nique from “A Practical Model 
for Sub-surface Light Trans¬ 
port,” Jensen et at, Proceed¬ 
ings of SIGGRAPH 2001. Im¬ 
ages courtesy Henrik Jensen. 







Plate XXV. Ray-traced and 
photon-mapped image of 
an interior. Most of the light¬ 
ing is indirect. Image cour¬ 
tesy Henrik Jensen. 



Plate XXVI. The brightly 
colored pattern in the 
shadow is a “caustic” and 
is a product of light focused 
through the glass. It was 
computed using photon 
tracing. Image courtesy 
Henrik Jensen. 




Plate XXVII. Top: A set of 
ellipsoids approximates the 
model. Bottom: The ellip¬ 
soids are used to create a 
gravity-like implicit function 
which is then displaced. Im¬ 
age courtesy Eric Levin. 
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Recall that S is the identity for convolution. This means that 

OO 

(f*s 1/T ){u)= f{u-i/ T )-, 

i =—oo 

that is, convolving with the impulse train makes a whole series of equally spaced 
copies of the spectrum of /. A good intuitive interpretation of this seemingly odd 
result is that all those copies just express the fact (as we saw back in Section 9.1.1) 
that frequencies that differ by an integer multiple of the sampling frequency are 
indistinguishable once we have sampled—they will produce exactly the same set 
of samples. The original spectrum is called the base spectrum and the copies are 
known as alias spectra. 

The trouble begins if these copies of the signal’s spectrum overlap, which will 
happen if the signal contains any significant content beyond half the sample fre¬ 
quency. When this happens, the spectra add, and the information about different 
frequencies is irreversibly mixed up. This is the first place aliasing can occur, and 
if it happens here, it’s due to undersampling—using too low a sample frequency 
for the signal. 

Suppose we reconstruct the signal using the nearest-neighbor technique. This 
is equivalent to convolving with a box of width 1. (The discrete-continuous con¬ 
volution used to do this is the same as a continuous convolution with the series 
of impulses that represent the samples.) The convolution-multiplication property 
means that the spectrum of the reconstructed signal will be the product of the 
spectrum of the sampled signal and the spectrum of the box. The resulting recon¬ 
structed Fourier transform contains the base spectrum (though somewhat attenu¬ 
ated at higher frequencies), plus attenuated copies of all the alias spectra. Because 
the box has a fairly broad Fourier transform, these attenuated bits of alias spectra 
are significant, and they are the second form of aliasing, due to an inadequate 
reconstruction filter. These alias components manifest themselves in the image as 
the pattern of squares that is characteristic of nearest-neighbor reconstruction. 

Preventing Aliasing in Sampling 

To do high quality sampling and reconstruction, we have seen that we need to 
choose sampling and reconstruction filters appropriately. From the standpoint of 
the frequency domain, the purpose of lowpass filtering when sampling is to limit 
the frequency range of the signal so that the alias spectra do not overlap the base 
spectrum. Figure 9.49 shows the effect of sample rate on the Fourier transform of 
the sampled signal. Fligher sample rates move the alias spectra farther apart, and 
eventually whatever overlap is left does not matter. 
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Figure 9.49. The effect of sample rate on the frequency spectrum of the sampled signal. 
Higher sample rates push the copies of the spectrum apart, reducing problems caused by 
overlap. 


The key criterion is that the width of the spectrum must be less than the dis¬ 
tance between the copies—that is, the highest frequency present in the signal 
must be less than half the sample frequency. This is known as the Nyquist crite¬ 
rion , and the highest allowable frequency is known as the Nyquist frequency or 
Nyquist limit. The Nyquist-Shannon sampling theorem states that a signal whose 
frequencies do not exceed the Nyquist limit (or, said another way, a signal that is 
bandlimited to the Nyquist frequency) can, in principle, be reconstructed exactly 
from samples. 

With a high enough sample rate for a particular signal, we don’t need to use 
a sampling filter. But if we are stuck with a signal that contains a wide range of 
frequencies (such as an image with sharp edges in it), we must use a sampling 
filter to bandlimit the signal before we can sample it. Figure 9.50 shows the 
effects of three lowpass (smoothing) filters in the frequency domain, and Figure 
9.51 shows the effect of using these same filters when sampling. Even if the 
spectra overlap without filtering, convolving the signal with a lowpass filter can 
narrow the spectrum enough to eliminate overlap and produce a well-sampled 
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Figure 9.50. Applying lowpass (smoothing) filters narrows the frequency spectrum of a 
signal. 



Figure 9.51. How the lowpass filters from Figure 9.50 prevent aliasing during sampling. 
Lowpass filtering narrows the spectrum so that the copies overlap less, and the high fre¬ 
quencies from the alias spectra interfere less with the base spectrum. 
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Figure 9.52. The effects of different reconstruction filters in the frequency domain. A 
good reconstruction filter attenuates the alias spectra effectively while preserving the base 
spectrum. 





A A A A A 


Figure 9.53. Resampling viewed in the frequency domain. The resampling filter both 
reconstructs the signal (removes the alias spectra) and bandlimits it (reduces its width) for 
sampling at the new rate. 
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representation of the filtered signal. Of course, we have lost the high frequencies, 
but that’s better than having them get scrambled with the signal and turn into 
artifacts. 

Preventing Aliasing in Reconstruction 

From the frequency domain perspective, the job of a reconstruction filter is to re¬ 
move the alias spectra while preserving the base spectrum. In Figure 9.48, we can 
see that the crudest reconstruction filter, the box, does attenuate the alias spec¬ 
tra. Most important, it completely blocks the DC spike for all the alias spectra. 
This is a characteristic of all reasonable reconstruction biters: they have zeroes 
in frequency space at all multiples of the sample frequency. This turns out to be 
equivalent to the ripple-free property in the space domain. 

So a good reconstruction biter needs to be a good lowpass biter, with the 
added requirement of completely blocking all multiples of the sample frequency. 
The purpose of using a reconstruction biter different from the box biter is to more 
completely eliminate the alias spectra, reducing the leakage of high-frequency ar¬ 
tifacts into the reconstructed signal, while disturbing the base spectrum as little 
as possible. Figure 9.52 illustrates the effects of different biters when used dur¬ 
ing reconstruction. As we have seen, the box biter is quite “leaky” and results in 
plenty of artifacts even if the sample rate is high enough. The tent biter, result¬ 
ing in linear interpolation, attenuates high frequencies more, resulting in milder 
artifacts, and the B-spline biter is very smooth, controlling the alias spectra very 
effectively. It also smooths the base spectrum some—this is the tradeoff between 
smoothing and aliasing that we saw earlier. 

Preventing Aliasing in Resampling 

When the operations of reconstruction and sampling are combined in resampling, 
the same principles apply, but with one biter doing the work of both reconstruction 
and sampling. Figure 9.53 illustrates how a resampling biter must remove the 
alias spectra and leave the spectrum narrow enough to be sampled at the new 
sample rate. 

9.5.6 Ideal Filters vs. Useful Filters 

Following the frequency domain analysis to its logical conclusion, a biter that is 
exactly a box in the frequency domain is ideal for both sampling and reconstruc¬ 
tion. Such a biter would prevent aliasing at both stages without diminishing the 
frequencies below the Nyquist frequency at all. 


232 


9. Signal Processing 


Recall that the inverse and forward Fourier transforms are essentially iden¬ 
tical, so the spatial domain filter that has a box as its Fourier transform is the 
function sin7rx/7ra: = sine irx. 

However, the sine filter is not generally used in practice, either for sampling or 
for reconstruction, because it is impractical and because, even though it is optimal 
according to the frequency domain criteria, it doesn't produce the best results for 
many applications. 

For sampling, the infinite extent of the sine filter, and its relatively slow rate 
of decrease with distance from the center, is a liability. Also, for some kinds of 
sampling, the negative lobes are problematic. A Gaussian filter makes an excel¬ 
lent sampling filter even for difficult cases where high-frequency patterns must be 
removed from the input signal, because its Fourier transform falls off exponen¬ 
tially, with no bumps that tend to let aliases leak through. For less difficult cases, 
a tent filter generally suffices. 

For reconstruction, the size of the sine function again creates problems, but 
even more importantly, the many ripples create “ringing” artifacts in reconstructed 
signals. 


Exercises 

1. Show that discrete convolution is commutative and associative. Do the 
same for continuous convolution. 

2. Discrete-continuous convolution can’t be commutative, because its argu¬ 
ments have two different types. Show that it is associative, though. 

3. Prove that the B-spline is the convolution of four box functions. 

4. Show that the “flipped” definition of convolution is necessary by trying to 
show that convolution is commutative and associative using this (incorrect) 
definition (see the footnote on page 194): 

(a*b)[i\ = ^ a[j]b[i + j] 

3 

5. Prove that T{f -kg} = fg and f*g = T{fg}. 



Surface Shading 


To make objects appear to have more volume, it can help to use shading, i.e., the 
surface is “painted” with light. This chapter presents the most common heuristic 
shading methods. The first two, diffuse and Phong shading, were developed in the 
1970s and are available in most graphics libraries. The last, artistic shading, uses 
artistic conventions to assign color to objects. This creates images reminiscent of 
technical drawings, which is desirable in many applications. 


10.1 Diffuse Shading 


Many objects in the world have a surface appearance loosely described as “matte,” 
indicating that the object is not at all shiny. Examples include paper, unfinished 
wood, and dry unpolished stones. To a large degree, such objects do not have a 
color change with a change in viewpoint. For example, if you stare at a partic¬ 
ular point on a piece of paper and move while keeping your gaze fixed on that 
point, the color at that point will stay relatively constant. Such matte objects can 
be considered as behaving as Lambertian objects. This section discusses how to 
implement the shading of such objects. A key point is that all formulas in this 
chapter should be evaluated in world coordinates and not in the warped coordi¬ 
nates after the perspective transform is applied. Otherwise, the angles between 
normals are changed and the shading will be inaccurate. 
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Figure 10.1. The geome¬ 
try for Lambert's Law. Both 
n and I are unit vectors. 



Figure 10.2. When a sur¬ 
face points away from the 
light, it should receive no 
light. This case can be ver¬ 
ified by checking whether 
the dot product of I and n is 
negative. 


10.1.1 Lambertian Shading Model 

A Lambertian object obeys Lambert’s cosine law, which states that the color c 
of a surface is proportional to the cosine of the angle between the surface normal 
and the direction to the light source (Gouraud, 1971): 

c oc cos 9, 

or in vector form, c oc n • 1, 

where n and 1 are shown in Figure 10.1. Thus, the color on the surface will 
vary according to the cosine of the angle between the surface normal and the 
light direction. Note that the vector 1 is typically assumed not to depend on the 
location of the object. That assumption is equivalent to assuming the light is 
‘‘distant” relative to object size. Such a “distant” light is often called a directional 
light, because its position is specified only by a direction. 

A surface can be made lighter or darker by changing the intensity of the light 
source or the reflectance of the surface. The diffuse reflectance c r is the fraction 
of light reflected by the surface. This fraction will be different for different color 
components. For example, a surface is red if it reflects a higher fraction of red 
incident light than blue incident light. If we assume surface color is proportional 
to the light reflected from a surface, then the diffuse reflectance c r —an RGB 
color—must also be included: 


c oc c r n • 1. (10.1) 

The right-hand side of Equation (10.1) is an RGB color with all RGB components 
in the range [0,1]. We would like to add the effects of light intensity while keeping 
the RGB components in the range [0,1]. This suggests adding an RGB intensity 
term c; which itself has components in the range [0,1]: 

c = c r c;n ■ 1. (10.2) 


This is a very convenient form, but it can produce RGB components for c that 
are outside the range [0,1], because the dot product can be negative. The dot 
product is negative when the surface is pointing away from the light as shown in 
Figure 10.2. 

The “max” function can be added to Equation (10.2) to test for that case: 

c = c r Qmax(0, n • 1). (10.3) 

Another way to deal with the “negative” light is to use an absolute value: 


c = c r ci |n • 1|. 


(10.4) 
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While Equation (10.4) may seem physically implausible, it actually corresponds 
to Equation (10.3) with two lights in opposite directions. For this reason it is often 
called two-sided lighting (Figure 10.3). 


10.1.2 Ambient Shading 

One problem with the diffuse shading of Equation (10.3) is that any point whose 
normal faces away from the light will be black. In real life, light is reflected all 
over, and some light is incident from every direction. In addition, there is often 
skylight giving “ambient” lighting. One way to handle this is to use several light 
sources. A common trick is to always put a dim source at the eye so that all 
visible points will receive some light. Another way is to use two-sided lighting 
as described by Equation (10.4). A more common approach is to add an ambient 
term (Gouraud, 1971). This is just a constant color term added to Equation (10.3): 



Figure 10.3. Using Equa¬ 
tion (10.4), the two-sided 
lighting formula, is equiva¬ 
lent to assuming two op¬ 
posing light sources of the 
same color. 


c= c r ( c a + C;max (0, n • 1)). 


Intuitively, you can think of the ambient color c a as the average color of all sur¬ 
faces in the scene. If you want to ensure that the computed RGB color stays in 
the range [0,1] 3 , then c„ + Ci < (1,1,1). Otherwise your code should “clamp” 
RGB values above one to have the value one. 


10.1.3 Vertex-Based Diffuse Shading 

If we apply Equation (10.1) to an object made up of triangles, it will typically 
have a faceted appearance. Often, the triangles are an approximation to a smooth 
surface. To avoid the faceted appearance, we can place surface normal vectors at 
the vertices of the triangles (Phong, 1975), and apply Equation (10.3) at each of 
the vertices using the normal vectors at the vertices (see Figure 10.4). This will 
give a color at each triangle vertex, and this color can be interpolated using the 
barycentric interpolation described in Section 8.1.2. 

One problem with shading at triangle vertices is that we need to get the nor¬ 
mals from somewhere. Many models will come with normals supplied. If you 
tessellate your own smooth model, you can create normals when you create the 
triangles. If you are presented with a polygonal model that does not have nor¬ 
mals at vertices and you want to shade it smoothly, you can compute normals by 
a variety of heuristic methods. The simplest is to just average the normals of the 
triangles that share each vertex and use this average normal at the vertex. This 
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Figure 10.4. A circle (left) is approximated by an octagon (right). Vertex normals record the 
surface normal of the original curve. 


average normal will not automatically be of unit length, so you should convert it 
to a unit vector before using it for shading. 


10.2 Phong Shading 



Figure 10.5. The geom¬ 
etry for the Phong illumina¬ 
tion model. The eye should 
see a highlight if a is small. 


Some surfaces are essentially like matte surfaces, but they have highlights. Ex¬ 
amples of such surfaces include polished tile floors, gloss paint, and whiteboards. 
Highlights move across a surface as the viewpoint moves. This means that we 
must add a unit vector e toward the eye into our equations. If you look carefully 
at highlights, you will see that they are really reflections of the light; sometimes 
these reflections are blurred. The color of these highlights is the color of the 
light—the surface color seems to have little effect. This is because the reflection 
occurs at the object’s surface, and the light that penetrates the surface and picks 
up the object’s color is scattered diffusely. 


10.2.1 Phong Lighting Model 

We want to add a fuzzy “spot” the same color as the light source in the right place. 
The center of the dot should be drawn where the direction e to the eye “lines” up 
with the natural direction of reflection r as shown in Figure 10.5. Here “lines 
up” is mathematically equivalent to “where a is zero.” We would like to have the 
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highlight have some non-zero area, so that the eye sees some highlight wherever 
cr is small. 

Given r, we’d like a heuristic function that is bright when e = r and falls off 
gradually when e moves away from r. An obvious candidate is the cosine of the 
angle between them: 

c = ci (e • r), 


There are two problems with using this equation. The first is that the dot product 
can be negative. This can be solved computationally with an “if” statement that 
sets the color to zero when the dot product is negative. The more serious problem 
is that the highlight produced by this equation is much wider than that seen in real 



Figure 10.6. The effect of the Phong exponent on highlight characteristics. This uses 
Equation (10.5) for the highlight. There is also a diffuse component, giving the objects a 
shiny but non-metallic appearance. Image courtesy Nate Robins. 
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Figure 10.7. The geom¬ 
etry for calculating the vec¬ 
tor r. 



Figure 10.8. The unit vec¬ 
tor h is halfway between I 
and e. 


life. The maximum is in the right place and it is the right color, but it is just too 
big. We can narrow it without reducing its maximum color by raising to a power: 

c = c;max(0,e • r) p . (10.5) 

Here p is called the Phong exponent ; it is a positive real number (Phong, 1975). 
The effect that changing the Phong exponent has on the highlight can be seen in 
Figure 10.6. 

To implement Equation (10.5), we first need to compute the unit vector r. 
Given unit vectors 1 and n, r is the vector 1 reflected about n. Figure 10.7 shows 
that this vector can be computed as 

r = — 1 + 2(1 ■ n)n, (10.6) 

where the dot product is used to compute cos 9. 

An alternative heuristic model based on Equation (10.5) eliminates the need to 
check for negative values of the number used as a base for exponentiation (Warn, 
1983). Instead of r, we compute h, the unit vector halfway between 1 and e 
(Figure 10.8): 

, = e +1 

lie + 111' 

The highlight occurs when h is near n, i.e., when cosw = h • n is near 1. This 
suggests the rule: 

c=c;(h ■ n) p . (10.7) 

The exponent p here will have analogous control behavior to the exponent in 
Equation (10.5), but the angle between h and n is half the size of the angle be¬ 
tween e and r, so the details will be slightly different. The advantage of using the 
cosine between n and h is that it is always positive for eye and light above the 
plane. The disadvantage is that a square root and divide is needed to compute h. 

In practice, we want most materials to have a diffuse appearance in addition 
to a highlight. We can combine Equations (10.3) and (10.7) to get 

c= c r (c Q + c/max (0, n • 1)) + cj(h • n) p . (10.8) 

If we want to allow the user to dim the highlight, we can add a control term c p : 

c = c r ( c a + C/max (0, n • 1)) + qc p ( h • n) p . (10.9) 

The term c p is a RGB color, which allows us to change highlight colors. This is 
useful for metals where c p = c r , because highlights on metal take on a metallic 
color. In addition, it is often useful to make c p a neutral value less than one, so 
that colors stay below one. For example, setting c p = 1 — M where M is the 
maximum component of c r will keep colors below one for one light source and 
no ambient term. 
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10.2.2 Surface Normal Vector Interpolation 

Smooth surfaces with highlights tend to change color quickly compared to Lam¬ 
bertian surfaces with the same geometry. Thus, shading at the normal vectors can 
generate disturbing artifacts. 

These problems can be reduced by interpolating the normal vectors across the 
polygon and then applying Phong shading at each pixel. This allows you to get 
good images without making the size of the triangles extremely small. Recall 
from Chapter 3, that when rasterizing a triangle, we compute barycentric coordi¬ 
nates (cr, /3, 7) to interpolate the vertex colors cq, Ci, C2: 

c = ac 0 + /3ci + 7 C 2 . ( 10 . 10 ) 

We can use the same equation to interpolate surface normals no, ni, and n. 2 '. 

n = an 0 +/3ni + 7 n 2 . (10.11) 

And Equation (10.9) can then be evaluated for the n computed at each pixel. Note 
that the n resulting from Equation (10.11) is usually not a unit normal. Better 
visual results will be achieved if it is converted to a unit vector before it is used 
in shading computations. This type of normal interpolation is often called Phong 
normal interpolation (Phong, 1975). 

10.3 Artistic Shading 

The Lambertian and Phong shading methods are based on heuristics designed to 
imitate the appearance of objects in the real world. Artistic shading is designed to 
mimic drawings made by human artists (Yessios, 1979; Dooley & Cohen, 1990; 
Saito & Takahashi, 1990; L. Williams, 1991). Such shading seems to have advan¬ 
tages in many applications. For example, auto manufacturers hire artists to draw 
diagrams for car owners’ manuals. This is more expensive than using much more 
“realistic” photographs, so there is probably some intrinsic advantage to the tech¬ 
niques of artists when certain types of communication are needed. In this section, 
we show how to make subtly shaded line drawings reminiscent of human-drawn 
images. Creating such images is often called non-photorealistic rendering, but 
we will avoid that term because many non-photorealistic techniques are used for 
efficiency that are not related to any artistic practice. 


10.3.1 Line Drawing 

The most obvious thing we see in human drawings that we don't see in real life is 
silhouettes. When we have a set of triangles with shared edges, we should draw 
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an edge as a silhouette when one of the two triangles sharing an edge faces toward 
the viewer, and the other triangle faces away from the viewer. This condition can 
be tested for two normals no and ni by 

draw silhouette if (e • no)(e • ni) < 0. 

Here e is a vector from the edge to the eye. This can be any point on the edge or 
either of the triangles. Alternatively, if /;(p) = 0 are the implicit plane equations 
for the two triangles, the test can be written 

draw silhouette if /o(e)/i(e) < 0. 

We would also like to draw visible edges of a polygonal model. To do this, we 
can use either of the hidden surface methods of Chapter 12 for drawing in the 
background color and then draw the outlines of each triangle in black. This, in 
fact, will also capture the silhouettes. Unfortunately, if the polygons represent a 
smooth surface, we really don’t want to draw most of those edges. However, we 
might want to draw all creases where there really is a corner in the geometry. We 
can test for creases by using a heuristic threshold: 

draw crease if (no • ni) < threshold. 

This combined with the silhouette test will give nice-looking line drawings. 


10.3.2 Cool-to-Warm Shading 

When artists shade line drawings, they often use low intensity shading to give 
some impression of curve to the surface and to give colors to objects (Gooch et 
al., 1998). Surfaces facing in one direction are shaded with a cool color, such 
as a blue, and surfaces facing in the opposite direction are shaded with a warm 
color, such as orange. Typically these colors are not very saturated and are also 
not dark. That way, black silhouettes show up nicely. Overall this gives a cartoon¬ 
like effect. This can be achieved by setting up a direction to a “warm” light 1 and 
using the cosine to modulate color, where the warmth constant k w is defined on 
[ 0 , 1 ]: 

, 1 + n ■ 1 

k'W — • 


The color c is then just a linear blend of the cool color c c and the warm color c w : 

c — k w Cyj + (1 k w )c c . 
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Figure 10.9. Left: a Phong-illuminated image. Middle: cool-to-warm shading is not useful 
without silhouettes. Right: cool-to-warm shading plus silhouettes. Image courtesy Amy 
Gooch. (See also Plate III.) 


There are many possible c w and c h that will produce reasonable looking results. 
A good starting place for a guess is 

c c = (0.4,0.4,0.7), 
c c = (0.8,0.6,0.6). 

Figure 10.9 shows a comparison between traditional Phong lighting and this type 
of artistic shading. 


Frequently Asked Questions 

• All of the shading in this chapter seems like enormous hacks. Is that 
true? 

Yes. However, they are carefully designed hacks that have proven useful in prac¬ 
tice. In the long run, we will probably have better-motivated algorithms that in¬ 
clude physics, psychology, and tone-mapping. However, the improvements in 
image quality will probably be incremental. 
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• I hate calling pow(). Is there a way to avoid it when doing Phong lighting? 

A simple way is to only have exponents that are themselves a power of two, 

i.e., 2, 4, 8, 16, .... In practice, this is not a problematic restriction for most 
applications. A look-up table is also possible, but will often not give a large 
speed-up. 


Exercises 

1. The moon is poorly approximated by diffuse or Phong shading. What ob¬ 
servations tell you that this is true? 

2. Velvet is poorly approximated by diffuse or Phong shading. What observa¬ 
tions tell you that this is true? 

3. Why do most highlights on plastic objects look white, while those on gold 
metal look gold? 
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Texture Mapping 


The shading models presented in Chapter 10 assume that a diffuse surface has 
uniform reflectance c r . This is fine for surfaces such as blank paper or painted 
walls, but it is inefficient for objects such as a printed sheet of paper. Such objects 
have an appearance whose complexity arises from variation in reflectance prop¬ 
erties. While we could use such small triangles that the variation is captured by 
varying the reflectance properties of the triangles, this would be inefficient. 

The common technique to handle variations of reflectance is to store the re¬ 
flectance as a function or a a pixel-based image and “map” it onto a surface (Cat- 
mull, 1975). The function or image is called a texture map , and the process of 
controlling reflectance properties is called texture mapping. This is not hard to 
implement once you understand the coordinate systems involved. Texture map¬ 
ping can be classified by several different properties: 

1. the dimensionality of the texture function, 

2. the correspondences defined between points on the surface and points in the 
texture function, and 

3. whether the texture function is primarily procedural or primarily a table 
look-up. 

These items are usually closely related, so we will somewhat arbitrarily classify 
textures by their dimension. We first cover 3D textures, often called solid tex¬ 
tures or volume textures. We will then cover 2D textures, sometimes called image 
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textures. When graphics programmers talk about textures without specifying di¬ 
mension, they usually mean 2D textures. However, we begin with 3D textures 
because, in many ways, they are easier to understand and implement. At the end 
of the chapter we discuss bump mapping and displacement mapping which use 
textures to change surface normals and position, respectively. Although those 
methods modify properties other than reflectance, the images/functions they use 
are still called textured. This is consistent with common usage where any image 
used to modify object appearance is called a texture. 


11.1 3D Texture Mapping 

In previous chapters we used c r as the diffuse reflectance at a point on an object. 
For an object that does not have a solid color, we can replace this with a function 
c r (p) which maps 3D points to RGB colors (Peachey, 1985; Perlin, 1985). This 
function might just return the reflectance of the object that contains p. But for 
objects with texture, we should expect c r (p) to vary as p moves across a surface. 
One way to do this is to create a 3D texture that defines an RGB value at every 
point in 3D space. We will only call it for points p on the surface, but it is usually 
easier to define it for all 3D points than a potentially strange 2D subset of points 
that are on an arbitrary surface. Such a strategy is clearly suitable for surfaces that 
are “carved” from a solid medium, such as a marble sculpture. 

Note that in a ray-tracing program, we have immediate access to the point p 
seen through a pixel. However, for a z-buffer or BSP-tree program, we only know 
the point after projection into device coordinates. We will show how to resolve 
this problem in Section 11.3.1. 


11.1.1 3D Stripe Textures 

There are a surprising number of ways to make a striped texture. Let’s assume we 
have two colors Co and ci that we want to use to make the stripe color. We need 
some oscillating function to switch between the two colors. An easy one is a sine: 

RGB stripe ( point p ) 
if (sin(a:p) > 0) then 
return c 0 
else 

return c\ 
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We can also make the stripe’s width w controllable: 

RGB stripe( point p, real w) 
if (sin^a^/w) > 0) then 
return c 0 
else 

return c\ 

If we want to interpolate smoothly between the stripe colors, we can use a param¬ 
eter t to vary the color linearly: 

RGB stripe( point p, real w ) 

t = (1 + sin(7rp x /u;))/2 
return (1 — t)co + tc\ 

These three possibilities are shown in Figure 11.1. 


11.1.2 Texture Arrays 

Another way we can specify texture in space is to store a 3D array of color values 
and to associate a spatial position to each of these values. We first discuss this 
for 2D arrays in 2D space. Such textures can be applied in 3D by using two of 
the dimensions, e.g. x and y, to determine what texture values are used. We then 
extend those 2D results to 3D. 

We will assume the two dimensions to be mapped are called u and v. We also 
assume we have an n x by n y image that we use as the texture. Somehow we need 
every (u, v) to have an associated color found from the image. A fairly standard 
way to make texturing work for (it, v) is to first remove the integer portion of 
( u , v) so that it lies in the unit square. This has the effect of “tiling” the entire 
uv plane with copies of the now-square texture (Figure 11.2). We then use one of 
three interpolation strategies to compute the image color for that coordinate. The 
simplest strategy is to treat each image pixel as a constant colored rectangular tile 
(Figure 11.3 (a). To compute the colors, we apply c(u, v ) = Cy, where c(u, v) is 
the texture color at (u, v) and c r j is the pixel color for pixel indices: 



Figure 11.1. Various 
stripe textures result from 
drawing a regular array of 
xy points while keeping z 
constant. 


i = L un x \, 

j = [vn v J; 


( 11 . 1 ) 


|_:rj is the floor of x, ( n x ,n y ) is the size of the image being textured, and the 
indices start at (i,j) = (0, 0). This method for a simple image is shown in Fig¬ 
ure 11.3 (b). 
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Figure 11.2. The tiling of an image onto the ( u,v) plane. Note that the input image is 
rectangular, and that this rectangle is mapped to a unit square on the ( u,v) plane. 


For a smoother texture, a bilinear interpolation can be used as shown in Fig¬ 
ure 11.3 (c). Here we use the formula 

c(u, V) = (1 - u')( 1 - v')Cij 
+ u'( 1 - u , )c (i+ i )j 
+ (1 - u')v'Ci( j+1 ) 

where 

v! = n x u - [ n x u\, 
v' = n y v — [n y v\. 

The discontinuities in the derivative in intensity can cause visible mach bands, so 
hermite smoothing can be used: 

c(u,V ) = (1 — u")(l - v")Cij + 

+ u" {1 — 

+ (1 - u")v"Ci( j+ 1) 
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(a) (b) (c) (d) 

Figure 11.3. (a) The image on the left has nine pixels that are all either black or white. The 
three interpolation strategies are (b) nearest-neighbor, (c) bilinear, and (d) hermite. 

where 

u" = 3(u') 2 - 2(u') 3 , 
v" = 3(V) 2 - 2(v') 3 , 

which results in Figure 11.3 (d). 

In 3D, we have a 3D array of values. All of the ideas from 2D extend naturally. 
As an example, let’s assume that we will do trilinear interpolation between val¬ 
ues. First, we compute the texture coordinates (it', v', w') and the lower indices 
(i,j, k) of the array element to be interpolated: 

c(u, v, w) = (1 - u')(l - r/)(l - w')cijk 
+ u\ 1 - t/)(l - w’)c {i+1)jk 
+ (1 - u’)v'{l - w')Ci(j+i) k 

+ (! - w , )( 1 - v ') w 'cij( k + 1) 

+ u'v'{ 1 - w')C(i+l)(j+l) k 
+ u\ 1 - v')w'c(i + l)j(fc+l) 

+ (1 - u')v'w'c i{j+ I)(fe+ 1 ) 

-\- 11 V W 5 

where 

u = n x u - \n x u \, 
v' = n y v — \ n y v \, 
w' = n z w — [n z w\. 


( 11 . 2 ) 


(11.3) 


11.1.3 Solid Noise 

Although regular textures such as stripes are often useful, we would like to be able 
to make “mottled” textures such as we see on birds’ eggs. This is usually done 
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Figure 11.4. Absolute 
value of solid noise, and 
noise for scaled xand y val¬ 
ues. 


by using a sort of “solid noise,” usually called Perlin noise after its inventor, who 
received a technical Academy Award for its impact in the film industry (Perlin, 
1985). 

Getting a noisy appearance by calling a random number for every point would 
not be appropriate, because it would just be like “white noise” in TV static. We 
would like to make it smoother without losing the random quality. One possibility 
is to blur white noise, but there is no practical implementation of this. Another 
possibility is to make a large lattice with a random number at every lattice point, 
and then interpolate these random points for new points between lattice nodes; 
this is just a 3D texture array as described in the last section with random numbers 
in the array. This technique makes the lattice too obvious. Perlin used a variety 
of tricks to improve this basic lattice technique so the lattice was not so obvious. 
This results in a rather baroque-looking set of steps, but essentially there are just 
three changes from linearly interpolating a 3D array of random values. The first 
change is to use Hermite interpolation to avoid mach bands, just as can be done 
with regular textures. The second change is the use of random vectors rather than 
values, with a dot product to derive a random number; this makes the underlying 
grid structure less visually obvious by moving the local minima and maxima off 
the grid vertices. The third change is to use a ID array and hashing to create a 
virtual 3D array of random vectors. This adds computation to lower memory use. 
Here is his basic method: 

LxJ+l LyJ+l |zj+l 

n{x,y,z)= £ £ £ n ijk (x-i,y-j,z-k), 

*= L*J j=iv J fc=bJ 

where (x, y, z) are the Cartesian coordinates of x, and 

f lijk(u,v,w) = uj(u)lo(v)iaj(w) (Tjjfc • ( u,v,w )), 

and uj(t) is the cubic weighting function: 

w(t) = f2|t| 3 ^3|t| 2 + l if |*| < 1, 

[ 0 otherwise. 

The final piece is that T, 7 /, : is a random unit vector for the lattice point (x, y, z) = 
(i,j, k) . Since we want any potential ijk, we use a pseudorandom table: 

T ijk = G (</>(* + (j>(j + </>(k)))), 


where G is a precomputed array of n random unit vectors, and <b(i) = 
P[i mod n] where P is an array of length n containing a permutation of the 
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integers 0 through n — 1. In practice, Perlin reports n = 256 works well. To 
choose a random unit vector (v x ,v y , v z ) first set 

v x =2£ ~ 1 , 

v y = 2£ 7 - 1, 
v z = 2C - 1, 

where £, are canonical random numbers (uniform in the interval [0,1)). 
Then, if ( V x+ V y + V z) < 1, make the vector a unit vector. Otherwise keep setting 
it randomly until its length is less than one, and then make it a unit vector. This 
is an example of a rejection method, which will be discussed more in Chapter 14. 
Essentially, the “less than” test gets a random point in the unit sphere, and the 
vector for the origin to that point is uniformly random. That would not be true of 
random points in the cube, so we “get rid” of the corners with the test. 

Because solid noise can be positive or negative, it must be transformed before 
being converted to a color. The absolute value of noise over a ten by ten square is 
shown in Figure 11.4, along with stretched versions. There versions are stretched 
by scaling the points input to the noise function. 

The dark curves are where the original noise function changed from positive 
to negative. Since noise varies from —1 to 1, a smoother image can be achieved 
by using (noise +1)/2 for color. However, since noise values close to 1 or—1 are 
rare, this will be a fairly smooth image. Larger scaling can increase the contrast 
(Figure 11.5). 



Figure 11.5. 

0.5(noise+1) 

0.8(noise+1) 

intensity. 


Using 
(top) and 
(bottom) for 


11.1.4 Turbulence 

Many natural textures contain a variety of feature sizes in the same texture. Perlin 
uses a pseudofractal “turbulence” function: 


n *( x ) = 5Z 

i 


H2*x)| 

2 i 


This effectively repeatedly adds scaled copies of the noise function on top of itself 
as shown in Figure 11.6. 

The turbulence can be used to distort the stripe function: 


RGB turbstripe( point p, double w ) 

double t = (1 + sin(fci,Sp + turbulence(/c2P))/u;)/2 

return t * sO + (1 — t) * si 

Various values for k\ and ka were used to generate Figure 11.7. 
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Figure 11 .6. Turbulence function with (from top left to bottom right) one through eight terms 
in the summation. 


illi 

Figure 11.7. Various turbulent stripe textures with different /q , k 2 . The top row has only the 
first term of the turbulence series. 

11.2 2D Texture Mapping 

For 2D texture mapping, we use a 2D coordinate, often called uv, which is used 
to create a reflectance R(u, v). The key is to take an image and associate a ( u , v) 
coordinate system on it so that it can, in turn, be associated with points on a 3D 
surface. For example, if the latitudes and longitudes on the world map are associ¬ 
ated with a polar coordinate system on the sphere, we get a globe (Figure 11.8). 

It is crucial that the coordinates on the image and the object match in “just the 
right way.” As a convention, the coordinate system on the image is set to be the 
unit square (w, v ) G [0, l] 2 . For (u, v) outside of this square, only the fractional 
parts of the coordinates are used resulting in a tiling of the plane (Figure 11.2). 
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Figure 11.8. A Miller cylindrical projection map world map and its placement on the sphere. 
The distortions in the texture map (i.e., Greenland being so large) exactly correspond to the 
shrinking that occurs when the map is applied to the sphere. 


Note that the image has a different number of pixels horizontally and vertically, 
so the image pixels have a non-uniform aspect ratio in (it, v ) space. 

To map this ( u, v) £ [0, l] 2 image onto a sphere, we first compute the polar 
coordinates. Recall the spherical coordinate system described by Equation (2.25). 
For a sphere of radius R with center (c x , c y ,c z ), the parametric equation of the 
sphere is 


x = x c + R cos (j> sin 9, 
y = y c + R sin </> sin 9 , 
z = z c + R cos 9. 


We can find ( 9 , cj)): 


9 = arccos 
</> = arctan2(y -y c ,x- x c ), 

where arctan2(a, b) is the the atan2 of most math libraries which returns the 
arctangent of a/b. Because (9,<fi) £ [0,7r] x [—7r,7r], we convert to (u, v) as 
follows, after first adding 2-7 t to <P if it is negative: 

A. 

27t’ 

7 T — 9 

7r 

This mapping is shown in Figure 11.8. There is a similar, although likely more 
complicated way, to generate coordinates for most 3D shapes. 


u = 

v = 
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11.3 Texture Mapping for Rasterized Triangles 






u=0.25 u=0.5 u=0.75 

I v=0.25 I 



Figure 11.9. Top: a cal¬ 
ibration texture map. Bot¬ 
tom: the sphere viewed 
along the y-axis. 


For surfaces represented by triangle meshes, texture coordinates are defined by 
storing (u,u) texture coordinates at each vertex of the mesh (see Section 12.1). 
So, if a triangle is intersected at barycentric coordinates (/3, 7 ), you interpolate 
the (it, v) coordinates the same way you interpolate points. Recall that the point 
at barycentric coordinate (/3, 7 ) is 

p(/3, 7 ) = a + /3(b - a) + j(c - a). 

A similar equation applies for (it, v): 

u(/3, 7) = «« + P(ub - U a ) + "f(u c - U a ), 
v(/3, 7 ) = V a + /3{v b - V a ) + 7(^0 ~V a )- 

Several ways a texture can be applied by changing the (it, v) at triangle ver¬ 
tices are shown in Figure 11.10. This sort of calibration texture map makes it 
easier to understand the texture coordinates of your objects during debugging 
(Figure 11.9). 

We would like to get the same texture images whether we use a ray tracing 
program or a rasterization method, such as a z-buffer. There are some subtleties in 
achieving this with correct-looking perspective, but we can address this at the ras¬ 
terization stage. The reason things are not straightforward is that just interpolating 



Figure 11.10. Various mesh textures obtained by changing ( u,v) coordinates stored at 
vertices. 
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texture coordinates in screen space results in incorrect images, as shown for the 
grid texture shown in Figure 11.11. Because things in perspective get smaller as 
the distance to the viewer increases, the lines that are evenly spaced in 3D should 
compress in 2D image space. More careful interpolation of texture coordinates is 
needed to accomplish this. 


11.3.1 Perspective Correct Textures 

We can implement texture mapping on triangles by interpolating the ( u , v) coor¬ 
dinates, modifying the rasterization method of Section 8.1.2, but this results in the 
problem shown at the right of Figure 11.11. A similar problem occurs for triangles 
if screen space barycentric coordinates are used as in the following rasterization 
code: 

for all x do 
for all y do 

compute (a, (3 , 7 ) for ( 2 , y) 
if a £ (0, 1 ) and /? £ (0, 1 ) and 7 £ (0, 1 ) then 
t = at 0 + /3t 1 + 7t 2 

drawpixel ( x , y ) with color texture(t) for a solid texture 
or with texture^, 7 ) for a 2D texture. 

This code will generate images, but there is a problem. To unravel the basic prob¬ 
lem, let’s consider the progression from world space q to homogeneous point r to 
homogenized point s: 


x q 


x r 


Xy< j hy* 


x s 

Vq 

transform 

Vr 

homogenize 

y r / h r 
z r 1 h r 

= 

Vs 

z q 


z r 



Z S 

1 


h r 


1 


1 


If we use screen space, we are interpolating in s. However, we would like to be 
interpolating in space q or r, where the homogeneous division has not yet non- 
linearly distorted the barycentric coordinates of the triangle. 

The key observation is that \/h r is interpolated with no distortion. Likewise, 
so is u/h r and v/h r . In fact, so is k/h r , where k is any quantity that varies 
linearly across the triangle. Recall from Section 7.4 that if we transform all points 
along the line segment between points q and Q and homogenize, we have 


h r + t(hn — h r ) 



Figure 11.11. Left: correct 
perspective. Right: interpo¬ 
lation in screen space. 


(S-s), 
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but if we linearly interpolate in the homogenized space we have 


s + a(S — s). 


Although those lines sweep out the same points, typically a ^ t for the same 
points on the line segment. However, if we interpolate 1/h, we do get the same 
answer regardless of which space we interpolate in. To see this is true, confirm 
(Exercise 2): 

i + _ m 

h r hr + t(hn hr) \h]i h r J hr \hji h r 

This ability to interpolate 1/h linearly with no error in the transformed space 
allows us to correctly texture triangles. Perhaps the least confusing way to deal 
with this distortion is to compute the world space barycentric coordinates of the 
triangle (/3 w ,'y w ) in terms of screen space coordinates (/?, 7). We note that (3 s /h 
and 7 s /h can be interpolated linearly in screen space. For example, at the screen 
space position associated with screen space barycentric coordinates (/?, 7) , we 
can interpolate f3 w /h without distortion. Because /3 W = 0 at vertex 0 and vertex 
2 , and 8 W = 1 at vertex 1 , we have 



(11.4) 


Because of all the zero terms. Equation (11.5) is fairly simple. However, to get 
/3 W from it, we must know h. Because we know 1/h is linear in screen space, we 
have 



Dividing Equation (11.5) by Equation (11. 6 ) gives 


( 11 . 6 ) 


Pw = 



Multiplying numerator and denominator by hph\ h 2 and doing a similar set of 
manipulations for the analogous equations in 7 ,,. gives 

a __ h 0 h 2 p _ 

h\h 2 + h 2 P{h 0 - hi) + hi"f(h 0 - h 2 ) ’ 

(11.7) 


_ hphi'y _ 

h\h 2 + h 2 p(h 0 — hi) + hi~/(h 0 — h 2 ) 


Note that the two denominators are the same. 
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For triangles that use the perspective matrix from Chapter 7, recall that w = 
zfn where z is the distance from the viewer perpendicular to the screen. Thus, 
for that matrix \/z also varies linearly. We can use this fact to modify our 
scan-conversion code for three points t* = ( Xi,yi , Zi, hi) that have been passed 
through the viewing matrices, but have not been homogenized: 

Compute bounds for x = z'i/hi and y = yi/hi 

for all x do 
for all y do 

compute (a, /?, 7 ) for (x, y) 
if ( a G [0,1] and [3 G [0,1] and 7 G [0,1]) then 
d = h\h 2 + h 2 /3(h 0 - hi) + hi^{h 0 - h 2 ) 

(3 W = h 0 h 2 fl/d 
7w = h 0 hi7/d 
= 1 fiw 7w 
u = a w u 0 + (3 w ui + 7 w u 2 
v = a w v 0 + P w vi + 7 w v 2 
drawpixel (a:, y) with color texture(it, v) 

For solid textures, just recall that by the definition of barycentric coordinates 

P = (1 - An ~ 7™)Po + ^Pi + 7wP 2 , 

where p f are the world space vertices. Then, just call a solid texture routine for 
point p. 

11.4 Bump Textures 

Although we have only discussed changing reflectance using texture, you can also 
change the surface normal to give an illusion of fine-scale geometry on the sur¬ 
face. We can apply a bump map that perturbs the surface normal 
(J.F.Blinn, 1978). 

One way to do this is: 

vector3 n = surfaceNormal(a;) 
n += ki * vectorTurbulence(fc 2 * x) 
return t * sO + (1 — t) * si 

This is shown in Figure 11.12. 

To implement vectorTurbulence , we first need vectorNoise which produces a 
simple spatially-varying 3D vector: 

LccJ+i LyJ+i kl+i 

n v (x,y,z)= r ijk u}(x)u{y)(j{z). 

*=L*J j—Vv\ fc=L^J 
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Figure 11.12. Vector tur¬ 
bulence on a sphere of ra¬ 
dius 1.6. Lighting directly 
from above. Top: k-i = 0. 
Middle: k, = 0.08, k 2 = 8. 
Bottom: k-i =0.24, k 2 =8. 



Figure 11.13. The points 
p on the circle are each dis¬ 
placed in the direction of n 
by the function f( p). If fis 
continuous, then the result¬ 
ing points p' form a contin¬ 
uous surface. 


Then, vectorTurbulence is a direct analog of turbulence: sum a series of scaled 
versions of vectorNoise. 


11.5 Displacement Mapping 

One problem with Figure 11.12 is that the bumps neither cast shadows nor affect 
the silhouette of the object. These limitations occur because we are not really 
changing any geometry. If we want more realism, we can apply a displacement 
map (Cook et al., 1987). A displacement map actually changes the geometry 
using a texture. A common simplification is that the displacement will be in the 
direction of the surface normal. 

If we take all points p on a surface, with associated surface normal vectors n, 
then we can make a new surface using a 3D texture d(p): 

P' = P + /(P)n. 

This concept is shown in Figure 11.13. 

Displacement mapping is straightforward to implement in a z-buffer code by 
storing the surface to be displaced as a fine mesh of many triangles. Each vertex 
in the mesh can then be displaced along the normal vector direction. This results 
in large models, but it is quite robust. 


11.6 Environment Maps 

Often we would like to have a texture-mapped background and for objects to 
have specular reflections of that background. This can be accomplished using 
environment maps (J. F. Blinn, 1976). An environment map can be implemented 
as a background function that takes in a viewing direction b and returns a RGB 
color from a texture map. There are many ways to store environment maps. For 
example, we can use a spherical table indexed by spherical coordinates. In this 
section, we will instead describe a cube-based table with six square texture maps, 
often called a cube map. 

The basic idea of a cube map is that we have an infinitely large cube with 
a texture on each face. Because the cube is large, the origin of a ray does not 
change what the ray “sees.” This is equivalent to an arbitrarily-sized cube that is 
queried by a ray whose origin is at the Cartesian origin. As an example of how 
a given direction b is converted to ( u, v) coordinates, consider the right face of 
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Figure 11.14. The cube map has six axis-aligned textures that store the background. The 
right face contains a single texture. 


Figure 11.14. Here we have Xb as the maximum magnitude component. In that 
case, we can compute ( u, v ) for that texture to be 


y + x 
2x 

Z + X 


There are analogous formulas for the other five faces. 

So for any reflection ray a + tb we return cubemap{ b) for the background 
color. In a z-buffer implementation, we need to perform this calculation on a 
pixel-by-pixel basis. If at a given pixel we know the viewing direction c and the 
surface normal vector n, we can compute the reflected direction b (Figure 11.15). 
We can do this by modifying Equation (10.6) to get 


b = —c + 


2 (c ■ n)n 


( 11 . 8 ) 


Here the denominator of the fraction accounts for the fact that c may not be a unit 
vector. Because we need to know b at each pixel, we can either compute b at 
each triangle vertex and interpolate b in a perspective correct manner, or we can 
interpolate n and compute b for each pixel. This will allow us to call cubemapih) 
at each pixel. 



Figure 11.15. The vector 
b is the reflection of vector 
c with respect to the surface 
normal n. 
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11.7 Shadow Maps 

The basic observation to be made about a shadow map is that if we rendered the 
scene using the location of a light source as the eye, the visible surfaces would all 
be lit, and the hidden surfaces would all be in shadow. This can be used to deter¬ 
mine whether a point being rasterized is in shadow (L. Williams, 1978). First, we 
rasterize the scene from the point of view of the light source using matrix M s . 
This matrix is just the same as the full transform matrix M used for viewing in 
Section 7.3, but it uses the light position for the eye and the light’s main direction 
for the view-plane normal. 

Recall that the matrix M takes an (x, y. z ) in world coordinates and converts 
it to an (x'. y'. z') in relation to the screen. While rasterizing in a perspectively 
correct manner, we can get the ( x , y, z) that is seen through the center of each 
pixel. If we also rasterize that point using M s and round the resulting x- and 
y-coordinates, we will get 

(i,j, depth). 

We can compare this depth with the z-value in the shadow depth map at pixel 
If it is the same, then the point is lit, and otherwise it is in shadow. Because 
of computational inaccuracies, we should actually test whether the points are the 
same to within a small constant. 

Because we typically don't want the light to only be within a square window, 
often a spot light is used. This attenuates the value of the light source based on 
closeness to the sides of the shadow buffer. For example, if the shadow buffer is 
n x n pixels, then for pixel {i. j) in the shadow buffer, we can apply the attenuation 
coefficient based on the fractional radius r: 



Any radially decreasing function will then give a spot-like look. 


Frequently Asked Questions 

• How do I implement displacement mapping in ray tracing? 

There is no ideal way to do it. Generating all the triangles and caching the ge¬ 
ometry when necessary will prevent memory overload (Pharr & Hanrahan, 1996; 
Pharr et al., 1997). Trying to intersect the displaced surface directly is possible 
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when the displacement function is restricted (Patterson et al., 1991; Heidrich & 
Seidel, 1998; Smits et al., 2000). 

• Why don’t my images with textures look realistic? 

Humans are good at seeing small imperfections in surfaces. Geometric imperfec¬ 
tions are typically absent in computer-generated images that use texture maps for 
details, so they look “too smooth.” 

• My textured animations look bad when there are many texels visible in¬ 
side a pixel. What should I do? 

The problem is that the texture resolution is too high for that image. We would 
like a smaller down-sampled version of the texture. However, if we move closer, 
such a down-sampled texture would look too blurry. What we really need is to 
be able to dynamically choose the texture resolution based on viewing conditions 
so that about one texel is visible through each pixel. A common way to do that 
is to use MIP-mapping (L. Williams, 1983). That technique establishes a multi¬ 
resolution set of textures and chooses one of the textures for each polygon or 
pixel. Typically the resolutions vary by a factor of two, e.g., 512 2 , 256 2 , 128 2 , 
etc. 

Notes 

The discussion of perspective-correct textures is based on Fast Shadows and 
Lighting Effects Using Texture Mapping (Segal et al., 1992) and on 31) Game 
Engine Design (Eberly, 2000). 


Exercises 

1. Find several ways to implement an infinite 2D checkerboard using surface 
and solid techniques. Which is best? 

2. Verify that Equation (11.4) is a valid equality using brute-force algebra. 

3. How could you implement solid texturing by using the z-buffer depth and 
a matrix transform? 
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Data Structures for Graphics 


Certain data structures seem to pop up repeatedly in graphics applications, per¬ 
haps because they address fundamental underlying ideas like surfaces, space, and 
scene structure. This chapter talks about several basic and unrelated categories 
of data structures that are among the most common and useful: mesh structures, 
spatial data structures, scene graphs, and tiled multidimensional arrays. 

For meshes, we discuss the basic storage schemes used for storing static 
meshes and for transferring meshes to graphics APIs. We also discuss the winged- 
edge data structure (Baumgart, 1974) and the related half-edge structure, which 
are useful for managing models where the tessellation changes, such as in sub¬ 
division or model simplification. Although these methods generalize to arbitrary 
polygon meshes, we focus on the simpler case of triangle meshes here. 

Next, the scene-graph data structure is presented. Various forms of this data 
structure are ubiquitous in graphics applications because they are so useful in 
managing objects and transformations. All new graphics APIs are designed to 
support scene graphs well. 

For spatial data structures, we discuss three approaches to organizing models 
in 3D space—bounding volume hierarchies, hierarchical space subdivision, and 
uniform space subdivision—and the use of hierarchical space subdivision (BSP 
trees) for hidden surface removal. The same methods are also used for other 
purposes including geometry culling and collision detection. 

Finally, the tiled multidimensional array is presented. Originally developed 
to help paging performance in applications where graphics data needed to be 
swapped in from disk, such structures are now crucial for memory locality on 
machines regardless of whether the array fits in main memory. 
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12.1 Triangle Meshes 

Most real-world models are composed of complexes of triangles with shared ver¬ 
tices. These are usually known as triangular meshes, triangle meshes, or trian¬ 
gular irregular networks (TINs) and handling them efficiently is crucial to the 
performance of many graphics programs. The kind of efficiency that is impor¬ 
tant depends on the application. Meshes are stored on disk and in memory, and 
we’d like to minimize the amount of storage consumed. When meshes are trans¬ 
mitted across networks or from the CPU to the graphics system, they consume 
bandwidth, which is often even more precious than storage. In applications that 
perform operations on meshes, besides simply storing and drawing them—such 
as subdivision, mesh editing, mesh compression, or other operations—efficient 
access to adjacency information is crucial. 

Triangle meshes are generally used to represent surfaces, so a mesh is not just 
a collection of unrelated triangles, but rather a network of triangles that connect to 
one another through shared vertices and edges to form a single continuous surface. 
This is a key insight about meshes: a mesh can be handled more efficiently than a 
collection of the same number of unrelated triangles. 

The minimum information required for a triangle mesh is a set of triangles 
(triples of vertices) and the positions (in 3D space) of their vertices. But many, 
if not most, programs require the ability to store additional data at the vertices, 
edges, or faces to support texture mapping, shading, animation, and other opera¬ 
tions. Vertex data is the most common: each vertex can have material parameters, 
texture coordinates, irradiances—any parameters whose values change across the 
surface. These parameters are then linearly interpolated across each triangle to 
define a continuous function over the whole surface of the mesh. However, it is 
also occasionally important to be able to store data per edge or per face. 


12.1.1 Mesh Topology 


We’ll leave the precise def¬ 
initions to the mathemati¬ 
cians; see the chapter 
notes. 


The idea that meshes are surface-like can be formalized as constraints on the mesh 
topology—the way the triangles connect together, without regard for the vertex 
positions. Many algorithms will only work, or are much easier to implement, on a 
mesh with predictable connectivity. The simplest and most restrictive requirement 
on the topology of a mesh is for the surface to be a manifold. A manifold mesh is 
“watertight”—it has no gaps and separates the space on the inside of the surface 
from the space outside. It also looks like a surface everywhere on the mesh. 

The term manifold comes from the mathematical field of topology: roughly 
speaking, a manifold (specifically a two-dimensional manifold, or 2-manifold) is 
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a surface in which a small neighborhood around any point could be smoothed out 
into a bit of flat surface. This idea is most clearly explained by counterexample: 
if an edge on a mesh has three triangles connected to it, the neighborhood of a 
point on the edge is different from the neighborhood of one of the points in the 
interior of one of the triangles, because it has an extra “fin” sticking out of it 
(Figure 12.1). If the edge has exactly two triangles attached to it, points on the 
edge have neighborhoods just like points in the interior, only with a crease down 
the middle. Similarly, if the triangles sharing a vertex are in a configuration like 
the left one in Figure 12.2, the neighborhood is like two pieces of surface glued 
together at the center, which can’t be flattened without doubling it up. The vertex 
with the simpler neighborhood shown at right is just fine. 

Many algorithms assume that meshes are manifold, and it’s always a good 
idea to verify this property to prevent crashes or infinite loops if you are handed a 
malformed mesh as input. This verification boils down to checking that all edges 
are manifold and checking that all vertices are manifold by verifying the following 
conditions: 

• Every edge is shared by exactly two triangles. 

• Every vertex has a single, complete loop of triangles around it. 

Figure 12.1 illustrates how an edge can fail the first test by having too many tri¬ 
angles, and Figure 12.2 illustrates how a vertex can fail the second test by having 
two separate loops of triangles attached to it. 

Manifold meshes are convenient, but sometimes it’s necessary to allow meshes 
to have edges, or boundaries. Such meshes are not manifolds—a point on the 
boundary has a neighborhood that is cut off on one side. They are not necessarily 
watertight. However, we can relax the requirements of a manifold mesh to those 
for a manifold with boundary without causing problems for most mesh processing 
algorithms. The relaxed conditions are: 

• Every edge is used by either one or two triangles. 

• Every vertex connects to a single edge-connected set of triangles. 

Figure 12.3 illustrates these conditions: from left to right, there is an edge with 
one triangle, a vertex whose neighboring triangles are in a single edge-connected 
set, and a vertex with two disconnected sets of triangles attached to it. 

Finally, in many applications it’s important to be able to distinguish the “front” 
or “outside” of a surface from the “back” or “inside”—this is known as the ori¬ 
entation of the surface. For a single triangle we define orientation based on the 
order in which the vertices are listed: the front is the side from which the trian¬ 
gle’s three vertices are arranged in counterclockwise order. A connected mesh is 



Figure 12.1. Non-manifold 
(left) and manifold (right) in¬ 
terior edges. 



Figure 12.2. Non-manifold 
(left) and manifold (right) in¬ 
terior vertices. 



OK OK bad 


Figure 12.3. Conditions at 
the edge of a manifold with 
boundary. 


c c 



OK bad 


Figure 12.4. Triangles 
(B,A,C) and (D,C,A) are 
consistently oriented, 
whereas (B,A,C) and 
(A,C,D) are inconsistently 
oriented. 
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Figure 12.5. A triangu¬ 
lated Mobius band, which is 
not orientable. 


consistently oriented if its triangles all agree on which side is the front—and this 
is true if and only if every pair of adjacent triangles is consistently oriented. 

In a consistently oriented pair of triangles, the two shared vertices appear in 
opposite orders in the two triangles’ vertex lists (Figure 12.4). What’s important is 
consistency of orientation—some systems define the front using clockwise rather 
than counterclockwise order. 

Any mesh that has non-manifold edges can’t be oriented consistently. But 
it’s also possible for a mesh to be a valid manifold with boundary (or even a 
manifold), and yet have no consistent way to orient the triangles—they are not 
orientable surfaces. An example is the Mobius band shown in Figure 12.5. This 
is rarely an issue in practice, however. 


12.1.2 Indexed Mesh Storage 

A simple triangular mesh is shown in Figure 12.6. You could store these three 
triangles as independent entities, each of this form: 

Triangle { 

vector3 vertexPosition[3] 

} 


This would result in storing vertex b three times and the other vertices twice 
each for a total of nine stored points (three vertices for each of three triangles). Or 
you could instead arrange to share the common vertices and store only four, re¬ 


separate triangles: 



# 

vertex 0 

vertex 1 

vertex 2 

0 

(&X, 3y, a z ) 

(bx, by, bz) 

(C X , Cy, Cz) 

1 

(bx, by, bz) 

(dx, dy, dz) 

(Cx, Cy, Cz) 

2 

(a x , a y , a z ) 

(dx, dy, dz) 

(bx, by, bz) 


triangles 


vertices 

# 

vertices 

# 

position 

0 

(0,1,2) 

0 

(a x , ay, a z ) 

1 

(1,3,2) 

1 

(bx, by, bz) 

2 

(0,3,1) 

2 

(C X , Cy, Cz) 



3 

(dx, dy, dz) 


Figure 12.6. A three-triangle mesh with four vertices, represented with separate triangles 
(left) and with shared vertices (right). 
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suiting in a shared-vertex mesh. Logically, this data structure has triangles which 
point to vertices which contain the vertex data: 

Triangle { 

Vertex v[3] 

} 

Vertex { 

vector3 position // or other vertex data 

} 

Note that the entries in the v array are references, or pointers, to Vertex objects; 
the vertices are not contained in the triangle. 

In implementation, the vertices and triangles are normally stored in arrays, 
with the triangle-to-vertex references handled by storing array indices: 



Figure 12.7. The triangle- 
to-vertex references in a 
shared-vertex mesh. 


IndexedMesh { 
int tlnd[nt][3 ] 
vector3 verts[nv] 

} 


The index of the fcth vertex of the vth triangle is found in tlnd[i][k], and the 
position of that vertex is stored in the corresponding row of the verts array; see 
Figure 12.8 for an example. This way of storing a shared-vertex mesh is an in¬ 
dexed triangle mesh. 

Separate triangles or shared vertices will both work well. Is there a space 
advantage for sharing vertices? If our mesh has n v vertices and rit triangles, and 
if we assume that the data for floats, pointers, and ints all require the same storage 
(a dubious assumption), the space requirements are: 


verts[0] 

*oyo z o 

verts[1] 

x vyv z i 

verts[2] 

x 2'Y2' z 2 

verts[3] 

x 3'Y3' z 3 





tlnd[0] 

tlnd[1] 

0,2,1 

0,3,2 

tlnd[2] 

10,2,3 

tlnd[3] 

2,10,7 



Figure 12.8. A larger triangle mesh, with part of its representation as an indexed triangle 
mesh. 
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Is this factor of two worth 
the complication? I think 
the answer is yes, and it be¬ 
comes an even bigger win 
as soon as you start adding 
“properties” to the vertices. 



Figure 12.9. A triangle 
fan. 



Figure 12.10. A triangle 
strip. 


• Triangle. Three vectors per triangle, for 9n t units of storage; 

• IndexedMesh. One vector per vertex and three ints per triangle, for 3 n v + 
3rit units of storage. 

The relative storage requirements depend on the ratio of nt to n v . 

As a rule of thumb, a large mesh has each vertex connected to about six tri¬ 
angles (although there can be any number for extreme cases). Since each triangle 
connects to three vertices, this means that there are generally twice as many tri¬ 
angles as vertices in a large mesh: rit ~ 2 n v . Making this substitution, we can 
conclude that the storage requirements are 18n„ for the Triangle structure and 9 n v 
for IndexedMesh. Using shared vertices reduces storage requirements by about a 
factor of two; and this seems to hold in practice for most implementations. 


12.1.3 Triangle Strips and Fans 

Indexed meshes are the most common in-memory representation of triangle 
meshes, because they achieve a good balance of simplicity, convenience, and 
compactness. They are also commonly used to transfer meshes over networks 
and between the application and graphics pipeline. In applications where even 
more compactness is desirable, the triangle vertex indices (which take up two- 
thirds of the space in an indexed mesh with only positions at the vertices) can be 
expressed more efficiently using triangle strips and triangle fans. 

A triangle fan is shown in Figure 12.9. In an indexed mesh, the triangles 
array would contain [(0, 1, 2), (0, 2, 3), (0, 3, 4), (0, 4, 5)]. We are storing 
12 vertex indices, although there are only six distinct vertices. In a triangle fan, 
all the triangles share one common vertex, and the other vertices generate a set 
of triangles like the vanes of a collapsible fan. The fan in the figure could be 
specified with the sequence [0,1,2,3,4,5]: the first vertex establishes the center, 
and subsequently each pair of adjacent vertices (1-2,2-3, etc.) creates a triangle. 

The triangle strip is a similar concept, but it is useful for a wider range of 
meshes. Here, vertices are added alternating top and bottom in a linear strip as 
shown in Figure 12.10. The triangle strip in the figure could be specified by the 
sequence [0 1 2 3 45 6 7], and every subsequence of three adjacent vertices (0- 
1-2,1-2-3, etc.) creates a triangle. For consistent orientation, every other triangle 
needs to have its order reversed. In the example, this results in the triangles (0,1, 
2), (2, 1, 3), (2, 3,4), (4, 3, 5), etc. For each new vertex that comes in, the oldest 
vertex is forgotten and the order of the two remaining vertices is swapped. See 
Figure 12.11 for a larger example. 
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[0] 

m 



Figure 12.11. Two triangle strips in the context of a larger mesh. Note that neither strip can 
be extended to include the triangle marked with an asterisk. 


In both strips and fans, n + 2 vertices suffice to describe n triangles—a sub¬ 
stantial savings over the 3 n vertices required by a standard indexed mesh. Long 
triangle strips will save approximately a factor of three if the program is vertex- 
bound. 

It might seem that triangle strips are only useful if the strips are very long, 
but even relatively short strips already gain most of the benefits. The savings in 
storage space (for only the vertex indices) are as follows: 


strip length 

i 

2 

3 

4 

5 

6 

7 

8 

16 

too 

OO 

relative size 

1.00 

0.67 

0.56 

0.50 

0.47 

0.44 

0.43 

0.42 

0.38 

0.34 

0.33 


So, in fact, there is a rather rapid diminishing return as the strips grow longer. 
Thus, even for an unstructured mesh, it is worthwhile to use some greedy algo¬ 
rithm to gather them into short strips. 


12.1.4 Data Structures for Mesh Connectivity 

Indexed meshes, strips, and fans are all good, compact representations for static 
meshes. However, they do not readily allow for meshes to be modified. In order to 
efficiently edit meshes, more complicated data structures are needed to efficiently 
answer queries such as: 

• Given a triangle, what are the three adjacent triangles? 

• Given an edge, which two triangles share it? 
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• Given a vertex, which faces share it? 

• Given a vertex, which edges share it? 

There are many data structures for triangle meshes, polygonal meshes, and polyg¬ 
onal meshes with holes (see the notes at the end of the chapter for references). In 
many applications the meshes are very large, so an efficient representation can be 
crucial. 

The most straightforward, though bloated, implementation would be to have 
three types. Vertex, Edge, and Triangle, and to just store all the relationships di¬ 
rectly: 

Triangle { 

Vertex v[3] 

Edge e[3] 

} 

Edge { 

Vertex v[2] 

Triangle t[2] 

} 

Vertex { 

Triangle t[] 

Edge e[] 

} 

This lets us directly look up answers to the connectivity questions above, but 
because this information is all inter-related, it stores more than is really needed. 
Also, storing connectivity in vertices makes for variable-length data structures 
(since vertices can have arbitrary numbers of neighbors), which are generally less 
efficient to implement. Rather than committing to store all these relationships 
explicitly, it is best to define a class interface to answer these questions, behind 
which a more efficient data structure can hide. It turns out we can store only some 
of the connectivity and efficiently recover the other information when needed. 

The fixed-size arrays in the Edge and Triangle classes suggest that it will be 
more efficient to store the connectivity information there. In fact, for polygon 
meshes, in which polygons have arbitrary numbers of edges and vertices, only 
edges have fixed-size connectivity information, which leads to many traditional 
mesh data structures being based on edges. But for triangle-only meshes, storing 
connectivity in the (less numerous) faces is appealing. 

A good mesh data structure should be reasonably compact and allow efficient 
answers to all adjacency queries. Efficient means constant-time: the time to find 
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neighbors should not depend on the size of the mesh. We'll look at three data 
structures for meshes, one based on triangles and two based on edges. 


The Triangle-Neighbor Structure 

We can create a compact mesh data structure based on triangles by augmenting the 
basic shared-vertex mesh with pointers from the triangles to the three neighboring 
triangles, and a pointer from each vertex to one of the adjacent triangles (it doesn’t 
matter which one); see Figure 12.12: 

Triangle { 

Triangle nbr[3]; 

Vertex v[3]; 

} 

Vertex { 

// ... per-vertex data . . . 

Triangle t; // any adjacent tri 

} 


In the array Triangle .nbr, the fcth entry points to the neighboring triangle that 
shares vertices k and k + 1. We call this structure the triangle-neighbor struc¬ 
ture. Starting from standard indexed mesh arrays, it can be implemented with two 
additional arrays: one that stores the three neighbors of each triangle, and one 
that stores a single neighboring triangle for each vertex. (See Figure 12.13 for an 
example): 


vTri 

to] nr 

[ 1 ] 6 

[2] 3 

[3] 1 



Figure 12.13. The triangle neighbor structure as encoded in arrays, and the sequence that 
is followed in traversing the neighboring triangles of vertex 2. 



Figure 12.12. The ref¬ 
erences between triangles 
and vertices in the triangle 
neighbor structure. 
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Of course, a real program 
would do something with 
the triangles as it found 
them. 


Mesh { 

// ... per-vertex data ... 

int tlnd[nt][3]; // vertex indices 

int tNbr[nt][3]; // indices of neighbor triangles 

int vTri[nv]; // index of any adjacent triangle 


Clearly the neighboring triangles and vertices of a triangle can be found di¬ 
rectly in the data structure, but by using this triangle adjacency information care¬ 
fully it is also possible to answer connectivity queries about vertices in constant 
time. The idea is to move from triangle to triangle, visiting only the triangles 
adjacent to the relevant vertex. If triangle t, has vertex v as its fcth vertex, then 
the triangle t.nbr[fc] is the next triangle around v in the clockwise direction. This 
observation leads to the following algorithm to traverse all the triangles adjacent 
to a given vertex: 

TrianglesOfVertex(v) { 
t = v.t 
do { 

find i such that (t.v[i] == v) 
t = t.nbr[i] 

} while (t != v.t) 

} 

This operation finds each subsequent triangle in constant time—even though a 
search is required to find the position of the central vertex in each triangle’s vertex 
list, the vertex lists have constant size so the search takes constant time. However, 
that search is awkward and requires extra branching. 

A small refinement can avoid these searches. The problem is that once we 
follow a pointer from one triangle to the next, we don’t know from which way 
we came: we have to search the triangle’s vertices to find the vertex that con¬ 
nects back to the previous triangle. To solve this, instead of storing pointers to 
neighboring triangles, we can store pointers to specific edges of those triangles by 
storing an index with the pointer: 

Triangle { 

Edge nbr[3]; 

Vertex v[3]; 

} 

Edge { // the i-th edge of triangle t 
Triangle t; 
int i; // in {0,1,2} 

} 
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Vertex { 

// ... per-vertex data . . . 

Edge e; // any edge leaving vertex 

} 

In practice the Edge is stored by borrowing two bits of storage from the triangle 
index t to store the edge index i, so that the total storage requirements remain the 
same. 

In this structure the neighbor array for a triangle tells which of the neighboring 
triangles’ edges are shared with the three edges of that triangle. With this extra 
information, we always know where to find the original triangle, which leads to 
an invariant of the data structure: for any jth edge of any triangle t, 

f.nbr[j] ,t.nbr[f .nbr[j] .i] .t ==t. 

Knowing which edge we came in through lets us know immediately which edge to 
leave through in order to continue traversing around a vertex, leading to a stream¬ 
lined algorithm: 

TrianglesOfVertex(v) { 

{t, i} = v.e; 
do { 

{t, i} = t.nbr[i]; 

} while (t != v.t); 

} 

The triangle-neighbor structure is quite compact. For a mesh with only vertex 
positions, we are storing four numbers (three coordinates and an edge) per vertex 
and six (three vertex indices and three edges) per face, for a total of 4 n v + 6 n t « 
1 6n v units of storage per vertex, compared with 9 n v for the basic indexed mesh. 

The triangle neighbor structure as presented here works only for manifold 
meshes, because it depends on returning to the starting triangle to terminate the 
traversal of a vertex’s neighbors, which will not happen at a boundary vertex that 
doesn’t have a full cycle of triangles. However, it is not difficult to generalize 
it to manifolds with boundary, by introducing a suitable sentinel value (such as 
— 1) for the neighbors of boundary triangles and taking care that the boundary 
vertices point to the most counterclockwise neighboring triangle, rather than to 
any arbitrary triangle. 

The Winged-Edge Structure 

One widely used mesh data structure that stores connectivity information at the 
edges instead of the faces is the winged-edge data structure. This data struc- 
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[ 0 ] 

m 

[ 2 ] 



Figure 12.14. An example of a winged-edge mesh structure, stored in arrays. 



Figure 12.15. A tetrahedron and the associated elements for a winged-edge data structure. 
The two small tables are not unique; each vertex and face stores any one of the edges with 
which it is associated. 
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ture makes edges the first-class citizen of the data structure, as illustrated in Fig¬ 
ures 12.14 and 12.15. 

In a winged-edge mesh, each edge stores pointers to the two vertices it con¬ 
nects (the head and tail vertices), the two faces it is part of (the left and right 
faces), and, most importantly, the next and previous edges in the counterclock¬ 
wise traversal of its left and right faces (Figure 12.16). Each vertex and face also 
stores a pointer to a single, arbitrary edge that connects to it: 

Edge { 

Edge lprev, lnext, rprev, rnext; 

Vertex head, tail; 

Face left, right; 

} 

Face { 

// ... per-face data ... 

Edge e; // any adjacent edge 

} 

Vertex { 

// ... per-vertex data . . . 

Edge e; // any incident edge 

} 


The winged-edge data structure supports constant-time access to the edges of 
a face or of a vertex, and from those edges the adjoining vertices or faces can be 
found: 

EdgesOfVertex(v) { 
e = v. e ; 
do { 

if (e.tail == v) 
e = e.lprev; 
else 

e = e.rprev; 

} while (e != v.e); 

} 

EdgesOfFace(f) { 
e = f . e; 
do { 

if (e.left == f) 
e = e.lnext; 
else 

e = e.rnext; 



Figure 12.16. The refer¬ 
ences from an edge to the 
neighboring edges, faces, 
and vertices in the winged- 
edge structure. 
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} while (e != f.e) ; 

} 


These same algorithms and data structures will work equally well in a polygon 
mesh that isn’t limited to triangles; this is one important advantage of edge-based 
structures. 

As with any data structure, the winged-edge data structure makes a variety of 
time/space trade-offs. For example, we can eliminate the prev references. This 
makes it more difficult to traverse clockwise around faces or counterclockwise 
around vertices, but when we need to know the previous edge, we can always 
follow the successor edges in a circle until we get back to the original edge. This 
saves space, but it makes some operations slower. (See the chapter notes for more 
information on these tradeoffs). 

The Half-Edge Structure 

The winged-edge structure is quite elegant, but it has one remaining awkward¬ 
ness—the need to constantly check which way the edge is oriented before moving 
to the next edge. This check is directly analogous to the search we saw in the basic 
version of the triangle neighbor structure: we are looking to find out whether we 
entered the present edge from the head or from the tail. The solution is also almost 
indistinguishable: rather than storing data for each edge, we store data for each 
half-edge. There is one half-edge for each of the two triangles that share an edge, 
and the two half-edges are oriented oppositely, each oriented consistently with its 
own triangle. 

The data normally stored in an edge is split between the two half-edges. Each 
half-edge points to the face on its side of the edge and to the vertex at its head, and 
each contains the edge pointers for its face. It also points to its neighbor on the 


hedge[0] 

hedge[1] 

hedge[2] 

hedge[3] 

hedge[4] 

hedge[5] 



Figure 12.17. An example of a half-edge mesh structure, stored in arrays. 
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other side of the edge, from which the other half of the information can be found. 
Like the winged-edge, a half-edge can contain pointers to both the previous and 
next half-edges around its face, or only to the next half-edge. Well show the 
example that uses a single pointer. 

HEdge { 

HEdge pair, next; 

Vertex v; 

Face f; 

} 

Face { 

// ... per-face data ... 

HEdge h; // any h-edge of this face 

} 

Vertex { 

// ... per-vertex data . . . 

HEdge h; // any h-edge pointing toward this vertex 

} 



Figure 12.18. The refer¬ 
ences from a half-edge to 
its neighboring mesh com¬ 
ponents. 


Traversing a half-edge structure is just like traversing a winged-edge structure 
except that we no longer need to check orientation, and we follow the pair pointer 
to access the edges in the opposite face. 

EdgesOfVertex(v) { 
h = v. h; 
do { 

h = h.pair.next; 

} while (h != v.h) ; 

} 

EdgesOfFace(f) { 
h = f.h; 
do { 

h = h.next; 

} while (h != f.h); 

} 

The vertex traversal here is clockwise, which is necessary because of omitting 
the prev pointer from the structure. 

Because half-edges are generally allocated in pairs (at least in a mesh with 
no boundaries), many implementations can do away with the pair pointers. For 
instance, in an implementation based on array indexing (such as shown in Fig¬ 
ure 12.17), the array can be arranged so that an even-numbered edge i always 
pairs with edge i + 1 and an odd-numbered edge j always pairs with edge j — 1. 
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In addition to the simple traversal algorithms shown in this chapter, all three of 
these mesh topology structures can support “mesh surgery” operations of various 
sorts, such as splitting or collapsing vertices, swapping edges, adding or removing 
triangles, etc. 


12.2 Scene Graphs 

A triangle mesh manages a collection of triangles that constitute an object in a 
scene, but another universal problem in graphics applications is arranging the 
objects in the desired positions. As we saw in Chapter 6 , this is done using trans¬ 
formations, but complex scenes can contain a great many transformations and 
organizing them well makes the scene much easier to manipulate. Most scenes 
admit to a hierarchical organization, and the transformations can be managed ac¬ 
cording to this hierarchy using a scene graph. 

To motivate the scene-graph data structure, we will use the hinged pendulum 
shown in Figure 12.19. Consider how we would draw the top part of the pendu¬ 
lum: 

Mi = rotate( 0 ) 

M 2 = translate(p) 

M ^3 = MI2IVI1 

Apply M 3 to all points in upper pendulum 

The bottom is more complicated, but we can take advantage of the fact that it is 
attached to the bottom of the upper pendulum at point b in the local coordinate 
system. First, we rotate the lower pendulum so that it is at an angle 4> relative to 





Figure 12.19. A hinged pendulum. On the left are the two pieces in their “local” coordinate 
systems. The hinge of the bottom piece is at point b and the attachment for the bottom piece 
is at its local origin. The degrees of freedom for the assembled object are the angles 
and the location p of the top hinge. 
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its initial position. Then, we move it so that its top hinge is at point b. Now it is 
at the appropriate position in the local coordinates of the upper pendulum, and it 
can then be moved along with that coordinate system. The composite transform 
for the lower pendulum is: 

M a = rotate (0) 

Mf, = translate(b) 

M c = M fc M a 
M d = M 3 M c 

Apply Mrf to all points in lower pendulum 

Thus, we see that the lower pendulum not only lives in its own local coordinate 
system, but also that coordinate system itself is moved along with that of the upper 
pendulum. 

We can encode the pendulum in a data structure that makes management of 
these coordinate system issues easier, as shown in Figure 12.20. The appropriate 
matrix to apply to an object is just the product of all the matrices in the chain from 
the object to the root of the data structure. For example, consider the model of a 
ferry that has a car that can move freely on the deck of the ferry, and wheels that 
each move relative to the car as shown in Figure 12.21. 

As with the pendulum, each object should be transformed by the product of 
the matrices in the path from the root to the object: 

• ferry transform using Mq 

• car body transform using MqM\ 

• left wheel transform using MqM\M 2 

• left wheel transform using M 0 MiM 3 

An efficient implementation can be achieved using a matrix stack , a data structure 
supported by many APIs. A matrix stack is manipulated using push and pop op¬ 
erations that add and delete matrices from the right-hand side of a matrix product. 
For example, calling: 

push(Mo) 

push(Mi) 

push(M2) 

creates the active matrix M = M 0 M 1 M 2 . A subsequent call to pop() strips the 
last matrix added so that the active matrix becomes M = MqMi. Combining 
the matrix stack with a recursive traversal of a scene graph gives us: 


n 

M 3 


O 

M c 


Figure 12.20. The scene 
graph for the hinged pendu¬ 
lum of Figure 12.19. 



Figure 12.21. A ferry, 
a car on the ferry, and 
the wheels of the car (only 
two shown) are stored in a 
scene-graph. 









































278 


12. Data Structures for Graphics 


function traverse (node) 

push(M IoC ai) 

draw object using composite matrix from stack 
traverse (left child) 
traverse (right child) 

Pop() 

There are many variations on scene graphs but all follow the basic idea above. 


12.3 Spatial Data Structures 

In many, if not all, graphics applications, the ability to quickly locate geometric 
objects in particular regions of space is important. Ray tracers need to find objects 
that intersect rays; interactive applications navigating an environment need to find 
the objects visible from any given viewpoint; games and physical simulations re¬ 
quire detecting when and where objects collide. All these needs can be supported 
by various spatial data structures designed to organize objects in space so they 
can be looked up efficiently. 

In this section we will discuss examples of three general classes of spatial data 
structures. Structures that group objects together into a hierarchy are object par¬ 
titioning schemes: objects are divided into disjoint groups, but the groups may 
end up overlapping in space. Structures that divide space into disjoint regions 
are space partitioning schemes: space is divided into separate partitions, but one 
object may have to intersect more than one partition. Space partitioning schemes 
can be regular, in which space is divided into uniformly shaped pieces, or irregu¬ 
lar, in which space is divided adaptively into irregular pieces, with smaller pieces 
where there are more and smaller objects. 



Figure 12.22. Left: a uniform partitioning of space. Right: adaptive bounding-box hierarchy. 
Image courtesy David DeMarle. 
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We will use ray tracing as the primary motivation while discussing these struc¬ 
tures, though they can all also be used for view culling or collision detection. In 
Chapter 4, all objects were looped over while checking for intersections. For N 
objects, this is an O(N) linear search and is thus slow for large scenes. Like most 
search problems, the ray-object intersection can be computed in sub-linear time 
using “divide and conquer” techniques, provided we can create an ordered data 
structure as a preprocess. There are many techniques to do this. 

This section discusses three of these techniques in detail: bounding volume hi¬ 
erarchies (Rubin & Whitted, 1980; Whitted, 1980; Goldsmith & Salmon, 1987), 
uniform spatial subdivision (Cleary et al., 1983; Fujimoto et al., 1986; Ama- 
natides & Woo, 1987), and binary space partitioning (Glassner, 1984; Jansen, 
1986; Havran, 2000). An example of the first two strategies is shown in Fig¬ 
ure 12.22. 


12.3.1 Bounding Boxes 


A key operation in most intersection-acceleration schemes is computing the in¬ 
tersection of a ray with a bounding box (Figure 12.23). This differs from conven¬ 
tional intersection tests in that we do not need to know where the ray hits the box; 
we only need to know whether it hits the box. 

To build an algorithm for ray-box intersection, we begin by considering a 2D 
ray whose direction vector has positive x and y components. We can generalize 
this to arbitrary 3D rays later. The 2D bounding box is defined by two horizontal 
and two vertical lines: 


X — tTmin; 
X — -T-maxi 


bounding box 

V t> v 
I v 


Figure 12.23. The ray is 

only tested for intersection 
with the surfaces if it hits the 
bounding box. 


2/ — 2/minj 


V = 2/max- 


The points bounded by these lines can be described in interval notation: 

(*Tj U ) C [.r'min , .X'maxj X [//min > 2/max] ■ 

As shown in Figure 12.24, the intersection test can be phrased in terms of these 
intervals. First, we compute the ray parameter where the ray hits the line x = 

•Turin- 

+ _ *Tmin X e 

Gulin — • 

Xd 
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Figure 12.24. The ray will be inside the interval x e [x min ,x max ] for some interval in its 
parameter space t e [/ xm in> fxmax]- A similar interval exists for the y interval. The ray intersects 
the box if it is in both the x interval and / interval at the same time, i.e., the intersection of the 
two one-dimensional intervals is not empty. 


We then make similar computations for f xmax , f ym j n , and i ymax . The ray hits the 
box if and only if the intervals [t X min, fxmax] and [f ym in,fymax] overlap, i.e., their 

pseudocode this algorithm is: 


^ fxmax) then 

return false 
else 

return true 

The if statement may seem non-obvious. To see the logic of it, note that there is 
no overlap if the first interval is either entirely to the right or entirely to the left of 
the second interval. 

The first thing we must address is the case when Xd or y,i is negative. If Xd is 
negative, then the ray will hit x mdx before it hits .z' rnln . Thus the code for computing 


intersection is non-empty. In 

fxmin = (a^min %e)/%d 
fxmax = (a^max %e)/%d 
fymin = (!J mi n Uej/Ud 

fymax = ( V max ~ lh ) /lid 
if (fxmin > ^ymax) Of (^ymin 
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t xmin and f xmax expands to: 


if (xd > 0 ) then 

fx m i n — (^min 
fxmax — (^max 

else 

fxrnin — (•Crnax 
fxrnax — (^min 


X e )/X d 

X e )/X d 

x e )/x d 

x e )/x d 


A similar code expansion must be made for the y cases. A major concern is that 
horizontal and vertical rays have a zero value for yd and Xd, respectively. This 
will cause divide by zero which may be a problem. However, before addressing 
this directly, we check whether IEEE floating point computation handles these 
cases gracefully for us. Recall from Section 1.5 the rules for divide by zero: for 
any positive real number a, 

+a/0 = +oo; 

—a/0 = —oo. 

Consider the case of a vertical ray where Xd = 0 and yd > 0. We can then 
calculate 

+ _ *£min %e _ 

£xmin — ~ i 

, _ ^max %e 

^xmax — - • 

There are three possibilities of interest: 

1. X e < Xmin (no hit); 


2. ^min ^ ^ *^max (hit); 

3. :r max < x e (no hit). 

For the first case we have 

positive number 

f xmin — q - 

positive number 

Ixmax — q • 

This yields the interval (t xm ; n , fxmin) = (oo,oo). That interval will not overlap 
with any interval, so there will be no hit, as desired. For the second case, we have 

negative number 

fxmin — q ■ 

positive number 

Ixmax — 7T ■ 
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bounding box 

Y~Ymax 



Figure 12.25. A 2D ray e 

+ fd is tested against a 2D 
bounding box. 


This yields the interval (fxmin, fxmin) = (—oo,oo) which will overlap with all 
intervals and thus will yield a hit as desired. The third case results in the interval 
(—oo, — cxd) which yields no hit, as desired. Because these cases work as desired, 
we need no special checks for them. As is often the case, IEEE floating point 
conventions are our ally. However, there is still a problem with this approach. 

Consider the code segment: 


if (xd > 0 ) then 

f min — Or' rniri 
fmax — (■r'max 


X e )/X d 

X e )/Xd 


else 


1 min 
1 max 


(•Ernax 

(•Emin 


x e )/x d 

x e )/x d 


This code breaks down when Xd = —0. This can be overcome by testing on the 
reciprocal of Xd (A. Williams et al., 2005): 


a = l/x d 

if (a > 0) then 

fmin — ^(iCmin X e ) 

fmax — @(Xmax X e 

else 

f min — Cl(x max X e 

fmax — tj(iEmin -Ee, 


12.3.2 Hierarchical Bounding Boxes 

The basic idea of hierarchical bounding boxes can be seen by the common tactic 
of placing an axis-aligned 3D bounding box around all the objects as shown in 
Figure 12.25. Rays that hit the bounding box will actually be more expensive 
to compute than in a brute force search, because testing for intersection with the 
box is not free. However, rays that miss the box are cheaper than the brute force 
search. Such bounding boxes can be made hierarchical by partitioning the set of 
objects in a box and placing a box around each partition as shown in Figure 12.26. 
The data structure for the hierarchy shown in Figure 12.27 might be a tree with 
the large bounding box at the root and the two smaller bounding boxes as left and 
right subtrees. These would in turn each point to a list of three triangles. The 
intersection of a ray with this particular hard-coded tree would be: 
if (ray hits root box) then 

if (ray hits left subtree box) then 
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check three triangles for intersection 
if (ray intersects right subtree box) then 
check other three triangles for intersection 
if (an intersections returned from each subtree) then 
return the closest of the two hits 
else if (a intersection is returned from exactly one subtree) then 
return that intersection 
else 

return false 

else 

return false 

Some observations related to this algorithm are that there is no geometric ordering 
between the two subtrees, and there is no reason a ray might not hit both subtrees. 
Indeed, there is no reason that the two subtrees might not overlap. 

A key point of such data hierarchies is that a box is guaranteed to bound all 
objects that are below it in the hierarchy, but they are not guaranteed to contain 
all objects that overlap it spatially, as shown in Figure 12.27. This makes this 
geometric search somewhat more complicated than a traditional binary search on 
strictly ordered one-dimensional data. The reader may note that several possible 
optimizations present themselves. We defer optimizations until we have a full 
hierarchical algorithm. 

If we restrict the tree to be binary and require that each node in the tree have a 
bounding box, then this traversal code extends naturally. Further, assume that all 
nodes are either leaves in the tree and contain a primitive, or that they contain one 
or two subtrees. 

The bvh-node class should be of type surface, so it should implement 
surface::hit. The data it contains should be simple: 

class bvh-node subclass of surface 

virtual bool hit(ray e + fd, real to , real t \, hit-record rec) 

virtual box bounding-box() 

surface-pointer left 

surface-pointer right 

box bbox 



Figure 12.26. The bound¬ 
ing boxes can be nested by 
creating boxes around sub¬ 
sets of the model. 



Figure 12.27. The 

gray box is a tree node 
that points to the three gray 
spheres, and the thick black 
box points to the three black 
spheres. Note that not all 
spheres enclosed by the 
box are guaranteed to be 
pointed to by the corre¬ 
sponding tree node. 


The traversal code can then be called recursively in an object-oriented style: 

function bool bvh-node::hit(ray a + fb, real to, real i \, hit-record rec) 
if (bbox.hitbox(a + £b, to, ti)) then 
hit-record Irec, rrec 
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left-hit = (left ^ NULL) and (left —> hit(a + fb, lrec)) 
right-hit = (right ^ NULL) and (right —> hit(a + tb, to, t\, rrec)) 
if (left-hit and right-hit) then 
if (lrec.t < rrec.t) then 
rec = lrec 
else 

rec = rrec 
return true 
else if (left-hit) then 
rec = lrec 
return true 
else if (right-hit) then 
rec = rrec 
return true 
else 

return false 
else 

return false 

Note that because left and right point to surfaces rather than bvh-nodes 
specifically, we can let the virtual functions take care of distinguishing between 
internal and leaf nodes; the appropriate hit function will be called. Note that 
if the tree is built properly, we can eliminate the check for left being 
NULL. If we want to eliminate the check for right being NULL, we can 
replace NULL right pointers with a redundant pointer to left. This will 
end up checking left twice, but will eliminate the check throughout 
the tree. Whether that is worth it will depend on the details of tree 
construction. 

There are many ways to build a tree for a bounding volume hierarchy. It is 
convenient to make the tree binary, roughly balanced, and to have the boxes of 
sibling subtrees not overlap too much. A heuristic to accomplish this is to sort 
the surfaces along an axis before dividing them into two sublists. If the axes are 
defined by an integer with x = 0, y = 1, and z = 2we have: 

function bvh-node::create(object-array A, int AXIS) 

N = A.length 
if (N= 1) then 
left = A[0] 
right = NULL 
bbox = bounding-box(A[0]) 
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else if (N= 2) then 

left-node = A[0] 
right-node = A[l] 

bbox = combine(bounding-box(A[0]),bounding-box(A[l])) 

else 

sort A by the object center along AXIS 
left= new bvh-node(A[0..N/2 — 1], (AXIS +1) mod 3) 
right = new bvh-node(A[N/2..N— 1], (AXIS +1) mod 3) 
bbox = combine(left —> bbox, right —> bbox) 

The quality of the tree can be improved by carefully choosing AXIS each time. 
One way to do this is to choose the axis such that the sum of the volumes of the 
bounding boxes of the two subtrees is minimized. This change compared to ro¬ 
tating through the axes will make little difference for scenes composed of isotopi- 
cally distributed small objects, but it may help significantly in less well-behaved 
scenes. This code can also be made more efficient by doing just a partition rather 
than a full sort. 

Another, and probably better, way to build the tree is to have the subtrees 
contain about the same amount of space rather than the same number of objects. 
To do this we partition the list based on space: 

function bvh-node::create(object-array A, int AXIS) 

N = A.length 
if (N = 1) then 
left = A[0] 
right = NULL 

bbox = bounding-box(A[0]) 

else if (N = 2) then 

left = A[0] 
right = A[l] 

bbox = combine(bounding-box(A[0]),bounding-box(A[l])) 

else 

find the midpoint m of the bounding box of A along AXIS 
partition A into lists with lengths k and (N-k) surrounding m 
left = new bvh-node(A[0..k], (AXIS +1) mod 3) 
right = new bvh-node(A[k+l..N— 1], (AXIS +1) mod 3) 
bbox = combine(left —> bbox, right —> bbox) 

Although this results in an unbalanced tree, it allows for easy traversal of empty 
space and is cheaper to build because partitioning is cheaper than 
sorting. 
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Figure 12.28. In uniform spatial subdivision, the ray is tracked forward through cells until 
an object in one of those cells is hit. In this example, only objects in the shaded cells are 
checked. 

12.3.3 Uniform Spatial Subdivision 

Another strategy to reduce intersection tests is to divide space. This is funda¬ 
mentally different from dividing objects as was done with hierarchical bounding 
volumes: 

• In hierarchical bounding volumes, each object belongs to one of two sibling 
nodes, whereas a point in space may be inside both sibling nodes. 

• In spatial subdivision, each point in space belongs to exactly one node, 
whereas objects may belong to many nodes. 

In uniform spatial subdivision, the scene is partitioned into axis-aligned boxes. 
These boxes are all the same size, although they are not necessarily cubes. The 
ray traverses these boxes as shown in Figure 12.28. When an object is hit, the 
traversal ends. 

The grid itself should be a subclass of surface and should be implemented as 
a 3D array of pointers to surface. For empty cells these pointers are NULL. For 
cells with one object, the pointer points to that object. For cells with more than 
one object, the pointer can point to a list, another grid, or another data structure, 
such as a bounding volume hierarchy. 

This traversal is done in an incremental fashion. The regularity comes from 
the way that a ray hits each set of parallel planes, as shown in Figure 12.29. To 
see how this traversal works, first consider the 2D case where the ray direction 
has positive x and y components and starts outside the grid. Assume the grid is 
bounded by points (x m \„, y min ) and ( x mdx , y max ). The grid has n x x n y cells. 
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Figure 12.29. Although the pattern of cell hits seems irregular (left), the hits on sets of 
parallel planes are very even. 


Our first order of business is to find the index (i, j ) of the first cell hit by 
the ray e + td. Then, we need to traverse the cells in an appropriate order. The 
key parts to this algorithm are finding the initial cell (i,j) and deciding whether 
to increment i or j (Figure 12.30). Note that when we check for an intersection 
with objects in a cell, we restrict the range of t to be within the cell (Figure 12.31). 
Most implementations make the 3D array of type “pointer to surface.” To improve 
the locality of the traversal, the array can be tiled as discussed in Section 12.5. 

12.3.4 Axis-Aligned Binary Space Partitioning 

We can also partition space in a hierarchical data structure such as a binary space 
partitioning tree (BSP tree). This is similar to the BSP tree used for visibility 
sorting in Section 12.4, but it’s most common to use axis-aligned, rather than 
polygon-aligned, cutting planes for ray intersection. 

A node in this structure contains a single cutting plane and a left and right 
subtree. Each subtree contains all the objects on one side of the cutting plane. 
Objects that pass through the plane are stored in in both subtrees. If we assume 
the cutting plane is parallel to the yz plane at x = D, then the node class is: 
class bsp-node subclass of surface 

virtual bool hit(ray e + fd, real to , real t \, hit-record rec) 

virtual box bounding-box() 

surface-pointer left 

surface-pointer right 

real I) 



Figure 12.30. To decide 
whether we advance right 
or upwards, we keep track 
of the intersections with the 
next vertical and horizontal 
boundary of the cell. 




r 

ray ✓ 

/ 't 

cell (4 j) 






Figure 12.31. Only hits 
within the cell should be re¬ 
ported. Otherwise the case 
above would cause us to re¬ 
port hitting object b rather 
than object a. 
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Figure 12.32. The four 
cases of how a ray relates 
to the BSP cutting plane 
x=D. 


We generalize this to y and z cutting planes later. The intersection code can then 
be called recursively in an object-oriented style. The code considers the four 
cases shown in Figure 12.32. For our purposes, the origin of these rays is a point 
at parameter to'. 

p = a + f 0 b. 


The four cases are: 


1. The ray only interacts with the left subtree, and we need not test it for 
intersection with the cutting plane. It occurs for x p < D and Xb < 0. 

2. The ray is tested against the left subtree, and if there are no hits, it is then 
tested against the right subtree. We need to find the ray parameter at x = D, 
so we can make sure we only test for intersections within the subtree. This 
case occurs for x p < D and Xb > 0. 

3. This case is analogous to case 1 and occurs for x p > D and xt, > 0. 

4. This case is analogous to case 2 and occurs for x p > D and Xb < 0. 

The resulting traversal code handling these cases in order is: 

function bool bsp-node::hit( ray a + tb, real to, ie a I t \, hit-record rec) 

X p X a toXb 

if (x p < D) then 
if ( Xb < 0 ) then 

return (left ^ NULL) and (left^hit(a + tb, to, ti, rec)) 

t= (D - x a )/x b 

if (f > fi) then 

return (left ^ NULL) and (left^hit(a + tb, to, 1 1 , rec)) 
if (left ^ NULL) and (left—»hit(a + tb, to, f, rec)) then 
return true 

return (right ^ NULL) and (right—>hit(a + tb, f, ti, rec)) 
else 

analogous code for cases 3 and 4 


This is very clean code. However, to get it started, we need to hit some root object 
that includes a bounding box so we can initialize the traversal, to and t\. An issue 
we have to address is that the cutting plane may be along any axis. We can add 
an integer index axis to the bsp-node class. If we allow an indexing operator 
for points, this will result in some simple modifications to the code above, for 
example. 
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Xp — X a T - toXb 

would become 

u p = a[axis] + f 0 fr[ ax i s ] 

which will result in some additional array indexing, but will not generate more 
branches. 

While the processing of a single bsp-node is faster than processing a bvh-node, 
the fact that a single surface may exist in more than one subtree means there are 
more nodes and, potentially, a higher memory use. How “well” the trees are built 
determines which is faster. Building the tree is similar to building the BVH tree. 
We can pick axes to split in a cycle, and we can split in half each time, or we can 
try to be more sophisticated in how we divide. 


12.4 BSP Trees for Visibility 

Another geometric problem in which spatial data structures can be used is deter¬ 
mining the visibility ordering of objects in a scene with changing viewpoint. 

If we are making many images of a fixed scene composed of planar polygons, 
from different viewpoints—as is often the case for applications such as games — 
we can use a binary space partitioning scheme closely related to the method for 
ray intersection discussed in the previous section. The difference is that for vis¬ 
ibility sorting we use non-axis-aligned splitting planes, so that the planes can be 
made coincident with the polygons. This leads to an elegant algorithm known as 
the BSP tree algorithm to order the surfaces from front to back. The key aspect 
of the BSP tree is that it uses a preprocess to create a data structure that is useful 
for any viewpoint. So, as the viewpoint changes, the same data structure is used 
without change. 


12.4.1 Overview of BSP Tree Algorithm 

The BSP tree algorithm is an example of a painter’s algorithm. A painter’s algo¬ 
rithm draws every object from back-to-front, with each new polygon potentially 
overdrawing previous polygons, as is shown in Figure 12.33. It can be imple¬ 
mented as follows: 

sort objects back to front relative to viewpoint 
for each object do 
draw object on screen 
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Figure 12.33. A painter's algorithm starts with a blank image and then draws the scene one 
object at a time from back-to-front, overdrawing whatever is already there. This automatically 
eliminates hidden surfaces. 



Figure 12.34. A cycle oc¬ 
curs if a global back-to-front 
ordering is not possible for 
a particular eye position. 


The problem with the first step (the sort) is that the relative order of multiple 
objects is not always well defined, even if the order of every pair of objects is. 
This problem is illustrated in Figure 12.34 where the three triangles form a cycle. 

The BSP tree algorithm works on any scene composed of polygons where 
no polygon crosses the plane defined by any other polygon. This restriction is 
then relaxed by a preprocessing step. For the rest of this discussion, triangles are 
assumed to be the only primitive, but the ideas extend to arbitrary polygons. 

The basic idea of the BSP tree can be illustrated with two triangles, Xj and 
T 2 ■ We first recall (see Section 2.5.3) the implicit plane equation of the plane 
containing : /i(p) = 0. The key property of implicit planes that we wish to 
take advantage of is that for all points p + on one side of the plane, /i(p + ) > 0; 
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and for all points p on the other side of the plane, /) ( p ) < 0. Using this 
property, we can find out on which side of the plane T 2 lies. Again, this assumes 
all three vertices of T 2 are on the same side of the plane. For discussion, assume 
that X 2 is on the /i(p) < 0 side of the plane. Then, we can draw T) and 7 2 in the 
right order for any eyepoint e: 

if (/1 (e) < 0) then 
draw Xi 
draw X 2 

else 

draw X 2 
draw Xi 

The reason this works is that if T 2 and e are on the same side of the plane con¬ 
taining Xi, there is no way for X 2 to be fully or partially blocked by 1\ as seen 
from e, so it is safe to draw T\ first. If e and X 2 are on opposite sides of the 
plane containing 1 \, then T 2 cannot fully or partially block T), and the opposite 
drawing order is safe (Figure 12.35). 

This observation can be generalized to many objects provided none of them 
span the plane defined by T\. If we use a binary tree data structure with T) 
as root, the negative branch of the tree contains all the triangles whose 
vertices have /i(p) < 0, and the positive branch of the tree contains all the 
triangles whose vertices have /,(p) > 0. We can draw in proper order 

as follows: 

function draw(bsptree tree, point e) 
if (tree .empty) then 
return 



Figure 12.35. When e and T 2 are on opposite sides of the plane containing T h then it is 
safe to draw T 2 first and T r second. If e and T 2 are on the same side of the plane, then T ( 
should be drawn before T 2 . This is the core idea of the BSP tree algorithm. 
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if ( ,/tree.root(c) <t 0) then 

draw(tree.plus, e) 
rasterize tree .triangle 
draw(tree.minus, e) 
else 

draw(tree.minus, e) 
rasterize tree .triangle 
draw(tree.plus, e) 

The nice thing about that code is that it will work for any viewpoint e, so the 
tree can be precomputed. Note that, if each subtree is itself a tree, where the root 
triangle divides the other triangles into two groups relative to the plane containing 
it, the code will work as is. It can be made slightly more efficient by terminat¬ 
ing the recursive calls one level higher, but the code will still be simple. A tree 
illustrating this code is shown in Figure 12.36. As discussed in Section 2.5.5, the 
implicit equation for a point p on a plane containing three non-colinear points a, 
b, and c is 


/( P) = ((b - a) x (c - a)) • (p - a) = 0. (12.1) 



Figure 12.36. Three triangles and a BSP tree that is valid for them. The “positive” and 
“negative” are encoded by right and left subtree position, respectively. 
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It can be faster to store the (A, B , C, D) of the implicit equation of the form 

f(x , y, z) = Ax + By + Cz + D = 0. (12.2) 

Equations (12.1) and (12.2) are equivalent, as is clear when you recall that the 
gradient of the implicit equation is the normal to the triangle. The gradient of 
Equation (12.2) is n = ( A, B, C) which is just the normal vector 

n = (b — a) x (c — a). 

We can solve for D by plugging in any point on the plane, e.g., a: 

D — Ax a By a Cz a 

= — n • a. 


This suggests the form: 


/(p) = n ■ p — n • a 
= n- (p-a) 

= 0 , 

which is the same as Equation (12.1) once you recall that n is computed using the 
cross product. Which form of the plane equation you use and whether you store 
only the vertices, n and the vertices, or n, D, and the vertices, is probably a matter 
of taste—a classic time-storage tradeoff that will be settled best by profiling. For 
debugging, using Equation (12.1) is probably the best. 

The only issue that prevents the code above from working in general is that 
one cannot guarantee that a triangle can be uniquely classified on one side of a 
plane or the other. It can have two vertices on one side of the plane and the third 
on the other. Or it can have vertices on the plane. This is handled by splitting the 
triangle into smaller triangles using the plane to “cut” them. 


12.4.2 Building the Tree 

If none of the triangles in the dataset cross each other’s planes, so that all triangles 
are on one side of all other triangles, a BSP tree that can be traversed using the 
code above can be built using the following algorithm: 

tree-root = node(Ti) 
for i G {2,..., N} do 
tree-root.add(T)) 
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Figure 12.37. When a tri¬ 
angle spans a plane, there 
will be one vertex on one 
side and two on the other. 


a 



Figure 12.38. When a 
triangle is cut, we break it 
into three triangles, none 
of which span the cutting 
plane. 


function add (triangle T ) 

if (/(a) < 0 and /(b) < 0 and /(c) < 0 ) then 
if (negative subtree is empty) then 
negative-subtree = node(T) 

else 

negative-subtree .add (T) 

else if (/(a) > 0 and /(b) > 0 and /(c) > 0 ) then 
if positive subtree is empty then 
positive-subtree = node(T) 
else 

positive-subtree .add (T) 
else 

we have assumed this case is impossible 

The only thing we need to fix is the case where the triangle crosses the dividing 
plane, as shown in Figure 12.37. Assume, for simplicity, that the triangle has 
vertices a and b on one side of the plane, and vertex c is on the other side. In this 
case, we can find the intersection points A and B and cut the triangle into three 
new triangles with vertices 


Ti = (a,b, A), 
r 2 = (b,B,A), 
r 3 = (A,B,c), 

as shown in Figure 12.38. This order of vertices is important so that the direction 
of the normal remains the same as for the original triangle. If we assume that 
/(c) < 0 , the following code could add these three triangles to the tree assuming 
the positive and negative subtrees are not empty: 

positive-subtree = node (Ti) 
positive-subtree = node (T 2 ) 
negative-subtree = node (T 3 ) 

A precision problem that will plague a naive implementation occurs when a vertex 
is very near the splitting plane. For example, if we have two vertices on one side of 
the splitting plane and the other vertex is only an extremely small distance on the 
other side, we will create a new triangle almost the same as the old one, a triangle 
that is a sliver, and a triangle of almost zero size. It would be better to detect this 
as a special case and not split into three new triangles. One might expect this case 
to be rare, but because many models have tessellated planes and triangles with 
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shared vertices, it occurs frequently, and thus must be handled carefully. Some 
simple manipulations that accomplish this are: 

function add( triangle T ) 
fa = /(a) 
fb = /(b) 
fc = /(c) 

if (abs(fa) < e) then 

fa = 0 

if (abs(fb) < e) then 

fb = 0 

if (abs(fc) < e) then 

fc = 0 

if (/a < 0 and fb < 0 and fc < 0 ) then 
if (negative subtree is empty) then 
negative-subtree = node(T) 
else 

negative-subtree.add(T) 
else if (fa > 0 and fb > 0 and fc > 0) then 
if (positive subtree is empty) then 
positive-subtree = node(T) 
else 

positive-subtree.add(T) 

else 

cut triangle into three triangles and add to each side 

This takes any vertex whose / value is within e of the plane and counts it as 
positive or negative. The constant e is a small positive real chosen by the user. 
The technique above is a rare instance where testing for floating-point equality is 
useful and works because the zero value is set rather than being computed. Com¬ 
paring for equality with a computed floating-point value is almost never advisable, 
but we are not doing that. 

12.4.3 Cutting Triangles 

Filling out the details of the last case “cut triangle into three triangles and add to 
each side” is straightforward, but tedious. We should take advantage of the BSP 
tree construction as a preprocess where highest efficiency is not key. Instead, we 
should attempt to have a clean compact code. A nice trick is to force many of the 
cases into one by ensuring that c is on one side of the plane and the other two 
vertices are on the other. This is easily done with swaps. Filling out the details 
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in the final else statement (assuming the subtrees are non-empty for simplicity) 
gives: 

if (fa * fc> 0) then 
swap(/&, fc) 
swap(b, c) 
swap (fa, fb) 
swap(a, b) 

else if (fb * fc> 0) then 
swap (fa,fc) 
swap (a, c) 
swap (fajb) 
swap(a, b) 
compute A 
compute B 
T\ = (a, b. A) 

T 2 = (b,B,A) 

T 3 = (A,B,c) 
if (fc > 0) then 

negative-subtree .add(Ti) 
negative-subtree .add(T 2 ) 
positive-subtree ,add(T 3 ) 
else 

positive-subtree .add(Ti) 
positive-subtree ,add(T 2 ) 
negative-subtree .add(T 3 ) 

This code takes advantage of the fact that the product of a and b are positive if they 
have the same sign—thus, the first if statement. If vertices are swapped, we must 
do two swaps to keep the vertices ordered counterclockwise. Note that exactly 
one of the vertices may lie exactly on the plane, in which case the code above will 
work, but one of the generated triangles will have zero area. This can be handled 
by ignoring the possibility, which is not that risky, because the rasterization code 
must handle zero-area triangles in screen space (i.e., edge-on triangles). You can 
also add a check that does not add zero-area triangles to the tree. Finally, you can 
put in a special case for when exactly one of fa, fb, and fc is zero which cuts the 
triangle into two triangles. 

To compute A and B, a line segment and implicit plane intersection is needed. 
For example, the parametric line connecting a and c is 


p (t) = a + f(c — a). 
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The point of intersection with the plane n • p + D = 0 is found by plugging p(f) 
into the plane equation: 

n ■ (a + f(c — a)) + D = 0, 


and solving for t: 


n • a + D 
n • (c — a) 

Calling this solution tA, we can write the expression for 


A: 


A = a + fa(c — a). 


A similar computation will give B. 


12.4.4 Optimizing the Tree 

The efficiency of tree creation is much less of a concern than tree traversal because 
it is a preprocess. The traversal of the BSP tree takes time proportional to the 
number of nodes in the tree. (How well balanced the tree is does not matter.) 
There will be one node for each triangle, including the triangles that are created 
as a result of splitting. This number can depend on the order in which triangles 
are added to the tree. For example, in Figure 12.39, if T\ is the root, there will be 
two nodes in the tree, but if is the root, there will be more nodes, because T\ 
will be split. 

It is difficult to find the “best” order of triangles to add to the tree. For N 
triangles, there are N\ orderings that are possible. So trying all orderings is not 
usually feasible. Alternatively, some predetermined number of orderings can be 
tried from a random collection of permutations, and the best one can be kept for 
the final tree. 

The splitting algorithm described above splits one triangle into three trian¬ 
gles. It could be more efficient to split a triangle into a triangle and a con¬ 
vex quadrilateral. This is probably not worth it if all input models have only 
triangles, but would be easy to support for implementations that accommodate 
arbitrary polygons. 


12.5 Tiling Multidimensional Arrays 

Effectively utilizing the memory hierarchy is a crucial task in designing algo¬ 
rithms for modern architectures. Making sure that multidimensional arrays have 



Figure 12.39. Using T\ 
as the root of a BSP tree 
will result in a tree with two 
nodes. Using T 2 as the root 
will require a cut and thus 
make a larger tree. 
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Figure 12.40. The mem¬ 
ory layout for an untiled 2D 
array with N x = 4 and N y = 
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Figure 12.41. The mem¬ 
ory layout for a tiled 2D ar¬ 
ray with N x = 4 and N y = 3 
and 2x2 tiles. Note that 
padding on the top of the 
array is needed because N y 
is not a multiple of the tile 
size two. 


data in a “nice” arrangement is accomplished by tiling, sometimes also called 
bricking. A traditional 2D array is stored as a ID array together with an indexing 
mechanism; for example, an N x by N y array is stored in a ID array of length 
N x N y and the 2D index ( x , y ) (which runs from (0,0) to (N x — 1, N y — 1)) maps 
to the ID index (running from 0 to N x N y — 1) using the formula 

index = x + N x y. 

An example of how that memory lays out is shown in Figure 12.40. A problem 
with this layout is that although two adjacent array elements that are in the same 
row are next to each other in memory, two adjacent elements in the same column 
will be separated by N x elements in memory. This can cause poor memory lo¬ 
cality for large N x . The standard solution to this is to use tiles to make memory 
locality for rows and columns more equal. An example is shown in Figure 12.41 
where 2x2 tiles are used. The details of indexing such an array are discussed in 
the next section. A more complicated example, with two levels of tiling on a 3D 
array, is covered after that. 

A key question is what size to make the tiles. In practice, they should be 
similar to the memory-unit size on the machine. For example, if we are using 
16-bit (2-byte) data values on a machine with 128-byte cache lines, 8x8 tiles fit 
exactly in a cache line. However, using 32-bit floating-point numbers, which fit 
32 elements to a cache line, 5x5 tiles are a bit too small and 6x6 tiles are a 
bit too large. Because there are also coarser-sized memory units such as pages, 
hierarchical tiling with similar logic can be useful. 


12.5.1 One-Level Tiling for 2D Arrays 

If we assume an N x x N y array decomposed into square nx n tiles (Figure 12.42), 
then the number of tiles required is 

B x = N x /n, 

By = N y /n. 

Here, we assume that n divides N x and N y exactly. When this is not true, the 
array should be padded. For example, if N x = 15 and n = 4, then N x should 
be changed to 16. To work out a formula for indexing such an array, we first And 
the tile indices (b x ,b y ) that give the row/column for the tiles (the tiles themselves 
form a 2D array): 


b x = x -tr n, 
b y = y + n, 
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Figure 12.42. A tiled 2D array composed of B x x B y tiles each of size n by n. 


where 4- is integer division, e.g., 12 : 5 = 2. If we order the tiles along rows as 
shown in Figure 12.40, then the index of the first element of the tile (b x , b y ) is 

index = n 2 (B x b y + b x ). 

The memory in that tile is arranged like a traditional 2D array as shown in Fig¬ 
ure 12.41. The partial offsets (pc', y') inside the tile are 

x' = x mod n, 
y' = y mod n, 

where mod is the remainder operator, e.g., 12 mod 5 = 2. Therefore, the offset 
inside the tile is 

offset = y'n + x'. 

Thus the full formula for finding the ID index element (x, y) in an N x x N y array 
with n x n tiles is 

index = n 2 (B x b y + b x ) + y'n + a/, 

= n 2 ((N x 4- n)(y 4- n) + x 4- n) + (y mod n)n + (x mod n). 

This expression contains many integer multiplication, divide and modulus oper¬ 
ations, which are costly on some processors. When n is a power of two, these 
operations can be converted to bitshifts and bitwise logical operations. Flowever, 
as noted above, the ideal size is not always a power of two. Some of the mul¬ 
tiplications can be converted to shift/add operations, but the divide and modulus 
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operations are more problematic. The indices could be computed incrementally, 
but this would require tracking counters, with numerous comparisons and poor 
branch prediction performance. 

However, there is a simple solution; note that the index expression can be 
written as 

index = F x (x) + F y (y), 

where 

F x (x) = n 2 (x 4- n) + (x mod n), 

F y {y) = n 2 (N x 4- n)(y 4- n) + (y mod n)n. 

We tabulate F x and F y , and use x and y to find the index into the data array. 
These tables will consist of N x and N y elements, respectively. The total size of 
the tables will fit in the primary data cache of the processor, even for very large 
data set sizes. 


12.5.2 Example: Two-Level Tiling for 3D Arrays 


TLB: translation lookaside 
buffer, a cache that is part 
of the virtual memory sys¬ 
tem. 


Effective TLB utilization is also becoming a crucial factor in algorithm perfor¬ 
mance. The same technique can be used to improve TLB hit rates in a 3D array 
by creating m x m x m bricks of n x n x n cells. Lor example, a 40 x 20 x 19 
volume could be decomposed into 4x2x2 macrobricks of 2 x 2 x 2 bricks of 
5x5x5 cells. This corresponds to m = 2 and n = 5. Because 19 cannot be 
factored by mn = 10, one level of padding is needed. Empirically useful sizes 
are m = 5 for 16 bit datasets and m = 6 for float datasets. 

The resulting index into the data array can be computed for any (x, y. z) triple 
with the expression 

index = ((x -f- n) -f- m)n 3 m 3 ((N z X- n) -X m)((N y -X n) -f- m) 

+((y Tit)- m)n 3 m 3 ((N z -4- n) -j- m) 

+ ((z -4- n) 4- m)n 3 m 3 
+ {{x -4- n) mod m)n 3 m 2 
+ ((y 4- n) mod m)n 3 m 
+ ((z 4- n) mod m)n 3 
+ (x mod ( n 2 ))n 2 
+ (y mod n)n 
+{z mod n), 


where N x , N y and N z are the respective sizes of the dataset. 
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Note that, as in the simpler 2D one-level case, this expression can be written as 
index = F x (x) + F v (y) + F z (z), 

where 

F x (x) = ((x 4- n) -j- m)n 3 m 3 ((N z -f- n) -j- m){{N v -f- n) -t- m) 

+((x -1- n) mod m)n 3 m 2 
+(x mod n)n 2 , 

-F'y(y) = ((y A-n)-¥ m)n 3 m 3 ((N z —n) ^m) 

+((y -7- n) mod m)n 3 m + 

+ (y mod n)n, 

F z {z) = {{z n)-7-m)n 3 m 3 

+ ((z -7- n) mod tti ) ti 3 
+{z mod n). 

Frequently Asked Questions 

• Does tiling really make that much difference in performance? 

On some volume rendering applications, a two-level tiling strategy made as much 
as a factor-of-ten performance difference. When the array does not fit in main 
memory, it can effectively prevent thrashing in some applications such as image 
editing. 

• How do I store the lists in a winged-edge structure? 

For most applications it is feasible to use arrays and indices for the references. 
However, if many delete operations are to be performed, then it is wise to use 
linked lists and pointers. 


Notes 

The discussion of the winged-edge data structure is based on the course notes of 
Ching-Kuang Shene (Shene, 2003). There are smaller mesh data structures than 
winged-edge. The trade-offs in using such structures is discussed in Directed 
Edges—A Scalable Representation for Triangle Meshes (Campagna et al., 1998). 
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The tiled-array discussion is based on Interactive Ray Tracing for Volume Visual¬ 
ization (Parker, Martin, et al., 1999). A structure similar to the triangle neighbor 
structure is discussed in a technical report by Charles Loop (Loop, 2000). A dis¬ 
cussion of manifolds can be found in an introductory topology text (Munkres, 
2000). 

Exercises 

1. What is the memory difference for a simple tetrahedron stored as four in¬ 
dependent triangles and one stored in a winged-edge data structure? 

2. Diagram a scene graph for a bicycle. 

3. How many look-up tables are needed for a single-level tiling of an n- 
dimensional array? 

4. Given N triangles, what is the minimum number of triangles that could be 
added to a resulting BSP tree? What is the maximum number? 
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More Ray Tracing 


A ray tracer is a great substrate on which to build all kinds of advanced rendering 
effects. Many effects that take significant work to fit into the object-order ras¬ 
terization framework, including basics like the shadows and reflections already 
presented in Chapter 4, are simple and elegant in a ray tracer. In this chapter we 
discuss some fancier techniques that can be used to ray-trace a wider variety of 
scenes and to include a wider variety of effects. Some extensions allow more gen¬ 
eral geometry: instancing and constructive solid geometry (CSG) are two ways 
to make models more complex with minimal complexity added to the program. 
Other extensions add to the range of materials we can handle: refraction through 
transparent materials, like glass and water, and glossy reflections on a variety of 
surfaces are essential for realism in many scenes. 

This chapter also discusses the general framework of distribution ray trac¬ 
ing (Cook et al., 1984), a powerful extension to the basic ray-tracing idea in which 
multiple random rays are sent through each pixel in an image to produce images 
with smooth edges and to simply and elegantly (if slowly) produce a wide range 
of effects from soft shadows to camera depth-of-field. 

The price of the elegance of ray tracing is exacted in terms of computer time: 
most of these extensions will trace a very large number of rays for any non-trivial 
scene. Because of this, it’s crucial to use the methods described in Chapter 12 to 
accelerate the tracing of rays. 


If you start with a brute- 
force ray intersection loop, 
you'll have ample time to 
implement an acceleration 
structure while you wait for 
images to render. 
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Example values of n: 
air: 1.00; 
water: 1.33-1.34; 
window glass: 1.51; 
optical glass: 1.49-1.92; 
diamond: 2.42. 


13.1 Transparency and Refraction 


In Chapter 4 we discussed the use of recursive ray tracing to compute specular, 
or mirror, reflection from surfaces. Another type of specular object is a dielec¬ 
tric—a transparent material that refracts light. Diamonds, glass, water, and air are 
dielectrics. Dielectrics also filter light; some glass filters out more red and blue 
light than green light, so the glass takes on a green tint. When a ray travels from 
a medium with refractive index n into one with a refractive index n f , some of the 
light is transmitted, and it bends. This is shown for rit > n in Figure 13.1. Snell's 
law tells us that 

n sin 9 = rit sin cj). 

Computing the sine of an angle between two vectors is usually not as convenient 
as computing the cosine, which is a simple dot product for the unit vectors such 
as we have here. Using the trigonometric identity sin 2 9 + cos 2 6 = 1, we can 
derive a refraction relationship for cosines: 


COS 2 (j)= 1 — 


n 2 (l — cos 2 0) 


Note that if n and nt are reversed, then so are 9 and </> as shown on the right of 
Figure 13.1. 

To convert sin <j> and cos 0 into a 3D vector, we can set up a 2D orthonormal 
basis in the plane of the surface normal, n, and the ray direction, d. 

From Figure 13.2, we can see that n and b form an orthonormal basis for the 
plane of refraction. By definition, we can describe the direction of the transformed 



Figure 13.1. Snell’s Law describes how the angle cj> depends on the angle 9 and the 
refractive indices of the object and the surrounding medium. 
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ray, t, in terms of this basis: 


t = sin 4> b — cos </> n. 

Since we can describe d in the same basis, and d is known, we can solve for b: 

d = sin 9 b — cos On, 

, d + n cos 9 

b = - -• 

sm 9 

This means that we can solve for t with known variables: 



n (d + ncos#)) 
t =-n cos < 


n t 

n (d — n(d • n)) 
n t 


- n, l - 


i 2 (1 — (d • n) 2 ) 


Figure 13.2. The vectors 
n and b form a 2D orthonor¬ 
mal basis that is parallel to 
the transmission vector t. 


Note that this equation works regardless of which of n and nt is larger. An im¬ 
mediate question is, “What should you do if the number under the square root is 
negative?” In this case, there is no refracted ray and all of the energy is reflected. 
This is known as total internal reflection, and it is responsible for much of the 
rich appearance of glass objects. 

The reflectivity of a dielectric varies with the incident angle according to the 
Fresnel equations. A nice way to implement something close to the Fresnel equa¬ 
tions is to use the Schlick approximation (Schlick, 1994a), 

R{9) = R 0 + (1 - R 0 ) (1 - cos Of , 


where Rq is the reflectance at normal incidence: 


Ro — 


/ n t - 1 
\n t + 1 


2 


Note that the cos 9 terms above are always for the angle in air (the larger of the 
internal and external angles relative to the normal). 

For homogeneous impurities, as is found in typical colored glass, a light¬ 
carrying ray’s intensity will be attenuated according to Beer’s Law. As the ray 
travels through the medium it loses intensity according to dl = —Cldx, where 
dx is distance. Thus, dl/dx = —Cl. We can solve this equation and get the 
exponential I = 7exp(— Cx) + k'. The degree of attenuation is described by 
the RGB attenuation constant a, which is the amount of attenuation after one 
unit of distance. Putting in boundary conditions, we know that 7(0) = 7 q, and 
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Figure 13.3. The color of the glass is affected by total internal reflection and Beer's Law. 
The amount of light transmitted and reflected is determined by the Fresnel equations. The 
complex lighting on the ground plane was computed using particle tracing as described in 
Chapter 24. (See also Plate IV.) 

/(I) = al( 0). The former implies I(x) = Iq exp(— Cx). The latter implies 
Ioa = Io exp(—C), so — C = ln(a). Thus, the final formula is 

J(s) = 7(0)e” ln(a)s , 

where I(s) is the intensity of the beam at distance s from the interface. In practice, 
we reverse-engineer a by eye, because such data is rarely easy to find. The effect 
of Beer’s Law can be seen in Figure 13.3, where the glass takes on a green tint. 

To add transparent materials to our code, we need a way to determine when 
a ray is going “into” an object. The simplest way to do this is to assume that all 
objects are embedded in air with refractive index very close to 1.0, and that surface 
normals point “out” (toward the air). The code segment for rays and dielectrics 
with these assumptions is: 
if (p is on a dielectric) then 
r = reflect(d, n ) 
if (d n < 0) then 
refract(d, n, n, t) 
c = —d ■ n 
k r = k g — kb = 1 
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else 

k r = exp(— a r t) 
k g = exp (—a g t) 
kb = exp(— abt) 
if refract(d, — n, 1 /n, t) then 
c = t • n 
else 

return k * color(p + tr) 

Ro = (n- l) 2 /(n + l) 2 

R = R 0 + (1 - i? 0 )(l - c) 5 

return k(R color(p + tr) + (1 — R) color(p + ft)) 

The code above assumes that the natural log has been folded into the constants 
(a r . a g , at,). The refract function returns false if there is total internal re¬ 
flection, and otherwise it fills in the last argument of the argument 
list. 


13.2 Instancing 

An elegant property of ray tracing is that it allows very natural instancing. The 
basic idea of instancing is to distort all points on an object by a transformation 
matrix before the object is displayed. For example, if we transform the unit circle 
(in 2D) by a scale factor (2,1) in x and y, respectively, then rotate it by 45°, and 
move one unit in the a’-direction, the result is an ellipse with an eccentricity of 
2 and a long axis along the (x = — ^-direction centered at (0,1) (Figure 13.4). 
The key thing that makes that entity an “instance” is that we store the circle and 
the composite transform matrix. Thus, the explicit construction of the ellipse is 
left as a future operation at render time. 

The advantage of instancing in ray tracing is that we can choose the space in 
which to do intersection. If the base object is composed of a set of points, one of 
which is p, then the transformed object is composed of that set of points trans¬ 
formed by matrix M, where the example point is transformed to Mp. If we have 
a ray a + tb that we want to intersect with the transformed object, we can instead 
intersect an inverse-transformed ray with the untransformed object (Figure 13.5). 
There are two potential advantages to computing in the untransformed space (i.e., 
the right-hand side of Figure 13.5): 

1. the untransformed object may have a simpler intersection routine, e.g., a 
sphere versus an ellipsoid; 


1. scale 



2. rotate 3. move 


Figure 13.4. An instance 
of a circle with a series of 
three transforms is an el¬ 
lipse. 
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Figure 13.5. The ray intersection problem in the two spaces are just simple transforms of 
each other. The object is specified as a sphere plus matrix M. The ray is specified in the 
transformed (world) space by location a and direction b. 


2. many transformed objects can share the same untransformed object thus 
reducing storage, e.g., a traffic jam of cars, where individual cars are just 
transforms of a few base (untransformed) models. 

As discussed in Section 6.2.2, surface normal vectors transform differently. 
With this in mind and using the concepts illustrated in Figure 13.5, we can 
determine the intersection of a ray and an object transformed by matrix M. If we 
create an instance class of type surface, we need to create a hit 
function: 

instance:: hit (ray a + fb, real to , real ti , hit-record rec) 
ray r' = M^a + fM -1 b 
if (base-object—>hit(r', to,fi, rec)) then 
rec.n = (M _ ) T rec.n 
return true 
else 

return false 

An elegant thing about this function is that the parameter rec i does not need to 
be changed, because it is the same in either space. Also note that we need not 
compute or store the matrix M. 
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This brings up a very important point: the ray direction b must not be re¬ 
stricted to a unit-length vector, or none of the infrastructure above works. For this 
reason, it is useful not to restrict ray directions to unit vectors. 


13.3 Constructive Solid Geometry 

One nice thing about ray tracing is that any geometric primitive whose intersection 
with a 3D line can be computed can be seamlessly added to a ray tracer. It turns 
out to also be straightforward to add constructive solid geometry (CSG) to a ray 
tracer (Roth, 1982). The basic idea of CSG is to use set operations to combine 
solid shapes. These basic operations are shown in Figure 13.6. The operations 
can be viewed as set operations. For example, we can consider C the set of all 
points in the circle and S the set of all points in the square. The intersection 
operation C (T S is the set of all points that are both members of C and S. The 
other operations are analogous. 

Although one can do CSG directly on the model, if all that is desired is an 
image, we do not need to explicitly change the model. Instead, we perform the set 
operations directly on the rays as they interact with a model. To make this natural, 
we find all the intersections of a ray with a model rather than just the closest. For 
example, a ray a + fb might hit a sphere at t = 1 and t = 2. In the context 
of CSG, we think of this as the ray being inside the sphere for t £ [1,2]. We 
can compute these “inside intervals’’ for all of the surfaces and do set operations 
on those intervals (recall Section 2.1.2). This is illustrated in Figure 13.7, where 
the hit intervals are processed to indicate that there are two intervals inside the 
difference object. The first hit for t > 0 is what the ray actually intersects. 

In practice, the CSG intersection routine must maintain a list of intervals. 
When the first hitpoint is determined, the material property and surface normal is 
that associated with the hitpoint. In addition, you must pay attention to precision 
issues because there is nothing to prevent the user from taking two objects that 
abut and taking an intersection. This can be made robust by eliminating any 
interval whose thickness is below a certain tolerance. 


13.4 Distribution Ray Tracing 



circle and square. 



Figure 13.7. Intervals are 
processed to indicate how 
the ray hits the composite 
object. 


For some applications, ray-traced images are just too “clean.” This effect can be 
mitigated using distribution ray tracing (Cook et al., 1984) . The conventionally 
ray-traced images look clean, because everything is crisp; the shadows are per- 
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fectly sharp, the reflections have no fuzziness, and everything is in perfect focus. 
Sometimes we would like to have the shadows be soft (as they are in real life), the 
reflections be fuzzy as with brushed metal, and the image have variable degrees of 
focus as in a photograph with a large aperture. While accomplishing these things 
from first principles is somewhat involved (as is developed in Chapter 24), we 
can get most of the visual impact with some fairly simple changes to the basic ray 
tracing algorithm. In addition, the framework gives us a relatively simple way to 
antialias (recall Section 8.3) the image. 

Figure 13.8. Sixteen reg¬ 
ular samples for a single 
pixel. 


• • • • 

• • • • 

• • • • 

• • • • 



Figure 13.9. A simple 
scene rendered with one 
sample per pixel (lower left 
half) and nine samples per 
pixel (upper right half). 




Figure 13.10. Sixteen ran¬ 
dom samples for a single 
pixel. 


13.4.1 Antialiasing 

Recall that a simple way to antialias an image is to compute the average color 
for the area of the pixel rather than the color at the center point. In ray tracing, 
our computational primitive is to compute the color at a point on the screen. If 
we average many of these points across the pixel, we are approximating the true 
average. If the screen coordinates bounding the pixel are [i,i + 1] x [j,j + 1], 
then we can replace the loop: 

for each pixel (i,j) do 

dj = ray-color(* + 0.5, j + 0.5) 

with code that samples on a regular n x n grid of samples within each pixel: 

for each pixel (i,j) do 
c= 0 

for p = 0 to n — 1 do 
for q = 0 to n — 1 do 

c = c + ray-color(i + (p + 0.5 )/n,j + {q + 0.5)/n) 
dj = c/n 2 

This is usually called regular sampling. The 16 sample locations in a pixel for 
n = 4 are shown in Figure 13.8. Note that this produces the same answer as 
rendering a traditional ray-traced image with one sample per pixel at n x n by n y n 
resolution and then averaging blocks of n by n pixels to get a n x by n y image. 

One potential problem with taking samples in a regular pattern within a pixel 
is that regular artifacts such as moire patterns can arise. These artifacts can be 
turned into noise by taking samples in a random pattern within each pixel as 
shown in Figure 13.10. This is usually called random sampling and involves just 
a small change to the code: 
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for each pixel (i,j) do 
c = 0 

for p = 1 to n 2 do 

c = c+ ray-color(i + £, j + £) 

Cij = c/n 2 

Here £ is a call that returns a uniform random number in the range [0,1). Unfor¬ 
tunately, the noise can be quite objectionable unless many samples are taken. A 
compromise is to make a hybrid strategy that randomly perturbs a regular grid: 
for each pixel ( i,j ) do 

c = 0 

for p = 0 to n — 1 do 
for q = 0 to n — 1 do 

c = c + ray-color(i +(p + £)/n,j + (g + £)/n) 

Cij = c/n 2 

That method is usually called jittering or stratified sampling (Figure 13.11). 


13.4.2 Soft Shadows 



Figure 13.11. Sixteen 
stratified (jittered) samples 
for a single pixel shown with 
and without the bins high¬ 
lighted. There is exactly 
one random sample taken 
within each bin. 


The reason shadows are hard to handle in standard ray tracing is that lights are 
infinitesimal points or directions and are thus either visible or invisible. In real 
life, lights have non-zero area and can thus be partially visible. This idea is shown 
in 2D in Figure 13.12. The region where the light is entirely invisible is called 
the umbra. The partially visible region is called the penumbra. There is not a 
commonly used term for the region not in shadow, but it is sometimes called the 
anti-umbra. 

The key to implementing soft shadows is to somehow account for the light 
being an area rather than a point. An easy way to do this is to approximate the 
light with a distributed set of N point lights each with one A’th of the intensity 
of the base light. This concept is illustrated at the left of Figure 13.13 where nine 
lights are used. You can do this in a standard ray tracer, and it is a common trick 
to get soft shadows in an off-the-shelf Tenderer. There are two potential problems 
with this technique. First, typically dozens of point lights are needed to achieve 
visually smooth results, which slows down the program a great deal. The second 
problem is that the shadows have sharp transitions inside the penumbra. 

Distribution ray tracing introduces a small change in the shadowing code. 
Instead of representing the area light at a discrete number of point sources, we 
represent it as an infinite number and choose one at random for each viewing ray. 



Figure 13.12. A 

soft shadow has a gradual 
transition from the unshad¬ 
owed to shadowed region. 
The transition zone is the 
“penumbra” denoted by p in 
the figure. 
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Figure 13.13. Left: an area light can be approximated by some number of point lights; four 
of the nine points are visible to p so it is in the penumbra. Right: a random point on the light 
is chosen for the shadow ray, and it has some chance of hitting the light or not. 



Figure 13.14. The geom¬ 
etry of a parallelogram light 
specified by a corner point 
and two edge vectors. 


This amounts to choosing a random point on the light for any surface point being 
lit as is shown at the right of Figure 13.13. 

If the light is a parallelogram specified by a comer point c and two edge 
vectors a and b (Figure 13.14), then choosing a random point r is straightforward: 

r = c + £ia + £ 2 b, 

where and £ 2 are uniform random numbers in the range [0,1). 

We then send a shadow ray to this point as shown at the right in Figure 13.13. 
Note that the direction of this ray is not unit length, which may require some 
modification to your basic ray tracer depending upon its assumptions. 

We would really like to jitter points on the light. Flowever, it can be dangerous 
to implement this without some thought. We would not want to always have the 
ray in the upper left-hand corner of the pixel generate a shadow ray to the upper 
left-hand corner of the light. Instead we would like to scramble the samples, such 
that the pixel samples and the light samples are each themselves jittered, but so 
that there is no correlation between pixel samples and light samples. A good way 
to accomplish this is to generate two distinct sets of n 2 jittered samples and pass 
samples into the light source routine: 


for each pixel (i,j) do 
c = 0 

generate N = n 2 jittered 2D points and store in array r[ ] 
generate N = n 2 jittered 2D points and store in array s[] 
shuffle the points in array s[ ] 
for p = 0 to N — 1 do 

c = c + ray-color(i + r[p].x(),j + r[p].y(), s[p]) 

Cij = c/N 
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This shuffle routine eliminates any coherence between arrays r and s. The shadow 
routine will just use the 2D random point stored in s[p] rather than calling the 
random number generator. A shuffle routine for an array indexed from 0 to N — 1 
is: 

for i = N — 1 downto 1 do 

choose random integer j between 0 and i inclusive 
swap array elements i and j 


13.4.3 Depth of Field 

The soft focus effects seen in most photos can be simulated by collecting light at 
a non-zero size “lens” rather than at a point. This is called depth of field. The 
lens collects light from a cone of directions that has its apex at a distance where 
everything is in focus (Figure 13.15). We can place the “window” we are sampling 
on the plane where everything is in focus (rather than at the z = n plane as we did 
previously) and the lens at the eye. The distance to the plane where everything is 
in focus we call the focus plane , and the distance to it is set by the user, just as the 
distance to the focus plane in a real camera is set by the user or range finder. 




lens 


focus 

plane 


Figure 13.15. The lens 
averages over a cone of 
directions that hit the pixel 
location being sampled. 


Figure 13.16. An example of depth of field. The caustic in the shadow of the wine glass is 
computed using particle tracing as described in Chapter 24. (See also Plate V.) 
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Figure 13.17. To create 
depth-of-field effects, the 
eye is randomly selected 
from a square region. 


To be most faithful to a real camera, we should make the lens a disk. However, 
we will get very similar effects with a square lens (Figure 13.17). So we choose 
the side-length of the lens and take random samples on it. The origin of the 
view rays will be these perturbed positions rather than the eye position. Again, a 
shuffling routine is used to prevent correlation with the pixel sample positions. An 
example using 25 samples per pixel and a large disk lens is shown in Figure 13.16. 


13.4.4 Glossy Reflection 



Figure 13.18. The re¬ 
flection ray is perturbed to 
a random vector r'. 


Some surfaces, such as brushed metal, are somewhere between an ideal mirror 
and a diffuse surface. Some discernible image is visible in the reflection but it 
is blurred. We can simulate this by randomly perturbing ideal specular reflection 
rays as shown in Figure 13.18. 

Only two details need to be worked out: how to choose the vector r' and what 
to do when the resulting perturbed ray is below the surface from which the ray is 
reflected. The latter detail is usually settled by returning a zero color when the 
ray is below the surface. 

To choose r', we again sample a random square. This square is perpendicular 
to r and has width a which controls the degree of blur. We can set up the square’s 
orientation by creating an orthonormal basis with w = r using the techniques in 
Section 2.4.6. Then, we create a random point in the 2D square with side length 
a centered at the origin. If we have 2D sample points (£,£') £ [0, l] 2 , then the 
analogous point on the desired square is 

u= -- + £a, 


Because the square over which we will perturb is parallel to both the u and v 
vectors, the ray r' is just 

r' = r + uu + vv. 

Note that r' is not necessarily a unit vector and should be normalized if your code 
requires that for ray directions. 


13.4.5 Motion Blur 

We can add a blurred appearance to objects as shown in Figure 13.19. This is 
called motion blur and is the result of the image being formed over a non-zero 
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Figure 13.19. The bottom right sphere is in motion, and a blurred appearance results. 
Image courtesy Chad Barb. 


span of time. In a real camera, the aperture is open for some time interval during 
which objects move. We can simulate the open aperture by setting a time variable 
ranging from To to Tj. For each viewing ray we choose a random time, 

T = T 0 + £(7i - T 0 ). 

We may also need to create some objects to move with time. For example, we 
might have a moving sphere whose center travels from Co to C\ during the interval. 
Given T, we could compute the actual center and do a ray-intersection with that 
sphere. Because each ray is sent at a different time, each will encounter the sphere 
at a different position, and the final appearance will be blurred. Note that the 
bounding box for the moving sphere should bound its entire path so an efficiency 
structure can be built for the whole time interval (Glassner, 1988). 



316 


13. More Ray Tracing 


Notes 

There are many, many other advanced methods that can be implemented in the 
ray-tracing framework. Some resources for further information are Glassner's An 
Introduction to Ray Tracing and Principles of Digital Image Synthesis, Shirley’s 
Realistic Ray Tracing, and Pharr and Humphreys’s Physically Based Rendering: 
From Theory to Implementation. 


Frequently Asked Questions 

• What is the best ray-intersection efficiency structure? 

The most popular structures are binary space partitioning trees (BSP trees), uni¬ 
form subdivision grids, and bounding volume hierarchies. Most people who use 
BSP trees make the splitting planes axis-aligned, and such trees are usually called 
k-d trees. There is no clear-cut answer for which is best, but all are much, much 
better than brute-force search in practice. If I were to implement only one, it 
would be the bounding volume hierarchy because of its simplicity and robustness. 

• Why do people use bounding boxes rather than spheres or ellipsoids? 

Sometimes spheres or ellipsoids are better. However, many models have polyg¬ 
onal elements that are tightly bounded by boxes, but they would be difficult to 
tightly bind with an ellipsoid. 



Sampling 


Many applications in graphics require “fair” sampling of unusual spaces, such as 
the space of all possible lines. For example, we might need to generate random 
edges within a pixel, or random sample points on a pixel that vary in density 
according to some density function. This chapter provides the machinery for such 
probability operations. These techniques will also prove useful for numerically 
evaluating complicated integrals using Monte Carlo integration, also covered in 
this chapter. 


14.1 Integration 

Although the words “integral” and “measure” often seem intimidating, they relate 
to some of the most intuitive concepts found in mathematics, and they should not 
be feared. For our very non-rigorous purposes, a measure is just a function that 
maps subsets to R ' in a manner consistent with our intuitive notions of length, 
area, and volume. For example, on the 2D real plane JR 2 , we have the area measure 
A which assigns a value to a set of points in the plane. Note that A is just a 
function that takes pieces of the plane and returns area. This means the domain of 
A is all possible subsets of R 2 , which we denote as the power set 'PjR 2 ). Thus, 
we can characterize A in arrow notation: 

A : P(R 2 ) -> R+. 


317 







318 


14. Sampling 


An example of applying the area measure shows that the area of the square with 
side length one is one: 


A([a, a + 1] x [b, b + 1]) = 1, 


where (a, b) is just the lower left-hand corner of the square. Note that a single 
point such as (3, 7) is a valid subset of R 2 and has zero area: A((3, 7)) = 0. The 
same is true of the set of points S on the cc-axis, S = ( x , y) such that ( x , y) £ R 2 
and y = 0, i.e., A(S') = 0. Such sets are called zero measure sets. 

To be considered a measure, a function has to obey certain area-like properties. 
For example, we have a function y : V(S) —> R + . For y to be a measure, the 
following conditions must be true: 

1. The measure of the empty set is zero: p,(0) = 0, 

2. The measure of two distinct sets together is the sum of their measure alone. 

This rule with possible intersections is 


y(A U B) = y{A) + y{B) — y(A n B), 


where U is the set union operator and n is the set intersection operator. 

When we actually compute measures, we usually use integration. We can think 
of integration as really just notation: 



You can informally read the right-hand side as “take all points x in the region S, 
and sum their associated differential areas.” The integral is often written other 
ways including 



All of the above formulas represent “the area of region S’’ We will stick with the 
first one we used, because it is so verbose it avoids ambiguity. To evaluate such 
integrals analytically, we usually need to lay down some coordinate system and 
use our bag of calculus tricks to solve the equations. But have no fear if those 
skills have faded, as we usually have to numerically approximate integrals, and 
that requires only a few simple techniques which are covered later in this chapter. 

Given a measure on a set S, we can always create a new measure by weighting 
with a non-negative function w : § —> R + . This is best expressed in integral 
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notation. For example, we can start with the example of the simple area measure 
on [0, l] 2 : 

[ dA(x), 

Jxe[ o,i] 2 

and we can use a “radially weighted” measure by inserting a weighting function 
of radius squared: 

/ llxfcM(x). 

4X6[0,1] 2 

To evaluate this analytically, we can expand using a Cartesian coordinate system 
with dA = dxdy: 

||x|| 2 gL4(x) = [ f (x 2 +y 2 ) dxdy. 

I] 2 J rc=0 Jy —0 

The key thing here is that if you think of the ||x|| 2 term as married to the dA term, 
and that these together form a new measure, we can call that measure v. This 
would allow us to write v{S) instead of the whole integral. If this strikes you 
as just a bunch of notation and bookkeeping, you are right. But it does allow us 
to write down equations that are either compact or expanded depending on our 
preference. 



14.1.1 Measures and Averages 


Measures really start paying off when taking averages of a function. You can 
only take an average with respect to a particular measure, and you would like to 
select a measure that is “natural” for the application or domain. Once a measure 
is chosen, the average of a function / over a region S with respect to measure // is 


average(/) = 


Les /( x ) d A(x) 
Les d A(x) 


For example, the average of the function f(x, y) = x 2 over [0,2] 2 with respect to 
the area measure is 


average(/) = 


iLoiU* dxdy 

toto dxd y 


4 

3' 


This machinery helps solve seemingly hard problems where choosing the measure 
is the tricky part. Such problems often arise in integral geometry, a field that 
studies measures on geometric entities, such as lines and planes. For example, 
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one might want to know the average length of a line through [0, l] 2 . That is, by 
definition, 


average(length) 


/lines L through [0, t] 2 ^gth(L)d^L) 


J lines L through [0, l] 2 


dfj,(L ) 


All that is left, once we know that, is choosing the appropriate //. for the applica¬ 
tion. This is dealt with for lines in the next section. 



Figure 14.1. These 

two bundles of lines should 
have the same measure. 
They have different inter¬ 
section lengths with the 
y-axis so using db would be 
a poor choice for a differen¬ 
tial measure. 



have the same measure. 
Since they have different 
values for change in slope, 
using dm would be a poor 
choice for a differential 
measure. 


14.1.2 Example: Measures on the Lines in the 2D Plane 


What measure /i is “natural”? 

If you parameterize the lines as y — mx + b, you might think of a given line 
as a point (to, b ) in “slope-intercept” space. An easy measure to use would be 
dm db, but this would not be a “good” measure in that not all equal size “bundles” 
of lines would have the same measure. More precisely, the measure would not be 
invariant with respect to change of coordinate system. For example, if you took 
all lines through the square [0,1] 2 , the measure of lines through it would not be 
the same as the measure through a unit square rotated forty-five degrees. What 
we would really like is a “fair” measure that does not change with rotation or 
translation of a set of lines. This idea is illustrated in Figures 14.1 and 14.2. 

To develop a natural measure on the lines, we should first start thinking of 
them as points in a dual space. This is a simple concept: the line y = mx + b 
can be specified as the point (to, b) in a slope-intercept space. This concept is 
illustrated in Figure 14.3. It is more straightforward to develop a measure in 
(</>, b) space. In that space b is the y-intercept, while (j) is the angle the line makes 
with the x-axis, as shown in Figure 14.4. Here, the differential measure do db 
almost works, but it would not be fair due to the effect shown in Figure 14.1. To 
account for the larger span b that a constant width bundle of lines makes, we must 
add a cosine factor: 

dy = cos <f> d<j) db. 


It can be shown that this measure, up to a constant, is the only one that is invariant 
with respect to rotation and translation. 

This measure can be converted into an appropriate measure for other param- 
eterizations of the line. For example, the appropriate measure for (to, b ) space 
is 


dy 


dm db 

(1 + to 2 ) 2 
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For the space of lines parameterized in (it, v) space, 

ux + vy + 1 = 0, 


the appropriate measure is 

du dv 

dp = - 3-. 

(u 2 + u 2 ) 5 

For lines parameterized in terms of (a, 6), the .'/'-intercept and //-intercept, the 
measure is 

ab da db 

dp = - 3-. 

(a 2 + b 2 Y 

Note that any of those spaces are equally valid ways to specify lines, and which is 
best depends upon the circumstances. However, one might wonder whether there 
exists a coordinate system where the measure of a set of lines is just an area in the 
dual space. In fact, there is such a coordinate system, and it is delightfully simple; 
it is the normal coordinates which specify a line in terms of the normal distance 
from the origin to the line, and the angle the normal of the line makes with respect 
to the x-axis (Figure 14.5). The implicit equation for such lines is 

x cos 0 + y sin 0 — p = 0. 

And, indeed, the measure in that space is 

dp = dp dd. 

We shall use these measures to choose fair random lines in a later section. 


14.1.3 Example: Measure of Lines in 3D 

In 3D there are many ways to parameterize lines. Perhaps, the simplest way is 
to use their intersection with a particular plane along with some specification of 
their orientation. For example, we could chart the intersection with the xy plane 
along with the spherical coordinates of its orientation. Thus, each line would be 
specified as a (x, y, 9 , </>) quadruple. This shows that lines in 3D are 4D entities, 
i.e., they can be described as points in a 4D space. 

The differential measure of a line should not vary with (x, y), but bundles of 
lines with equal cross section should have equal measure. Thus, a fair differential 
measure is 

dp = dx dy sin 9 d9 dcj). 



Figure 14.3. The set of 

points on the line y=mx + 
b in ( x, y) space can also 
be represented by a sin¬ 
gle point in ( m, b) space so 
the top line and the bottom 
point represent the same 
geometric entity: a 2D line. 



intercept space we param¬ 
eterize the line by angle 

<i> e [—7 t/2, 7t/ 2) rather 
than slope. 


y. 

ty 

p a\ 

re \ 




Figure 14.5. The normal 
coordinates of a line use 
the normal distance to the 
origin and an angle to spec¬ 
ify a line. 
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Another way to parameterize lines is to chart the intersection with two parallel 
planes. For example, if the line intersects the plane z = 0 at [x — it, y = v) and 
the plane z= 1 at (x = s,y = t), then the line can be described by the quadruple 
(it, v, s, t). Note, that like the previous parameterization, this one is degenerate 
for lines parallel to the xy plane. The differential measure is more complicated 
for this parameterization although it can be approximated as 

dy, ~ dudva ds dt, 

for bundles of lines nearly parallel to the z-axis. This is the measure often implic¬ 
itly used in image-based rendering. 

For sets of lines that intersect a sphere, we can use the parameterization of the 
two points where the line intersects the sphere. If these are in spherical coordi¬ 
nates, then the point can be described by the quadruple [9\ : <pi, 62 , $ 2 ) and the 
measure is just the differential area associated with each point: 

dy, = sin 9\ d 6 ± cUj> 1 sin 9 2 <702 d(j> 2 . 

This implies that picking two uniform random endpoints on the sphere results in 
a line with uniform density. This observation was used to compute form-factors 
by Mateu Sbert in his dissertation (Sbert, 1997). 

Note that sometimes we want to parameterize directed lines, and sometimes 
we want the order of the endpoints not to matter. This is a bookkeeping detail 
that is especially important for rendering applications where the amount of light 
flowing along a line is different in the two directions along the line. 


14.2 Continuous Probability 

Many graphics algorithms use probability to construct random samples to solve 
integration and averaging problems. This is the domain of applied continuous 
probability which has basic connections to measure theory. 


14.2.1 One-Dimensional Continuous Probability Density Functions 

Loosely speaking, a continuous random variable x is a scalar or vector 
quantity that “randomly” takes on some value from the real line 
M = (— 00 , + 00 ). The behavior of x is entirely described by the distribution 
of values it takes. This distribution of values can be quantitatively described by 
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the probability density function (pdf),p, associated with x (the relationship is de¬ 
noted x ~ p). The probability that x assumes a particular value in some interval 
[a, b] is given by the following integral: 

Probability(x € [a, 6]) = / p(x)dx. (14.1) 

J a 

Loosely speaking, the probability density function p describes the relative likeli¬ 
hood of a random variable taking a certain value; if p(x-\ ) = 6.0andp(:r2) = 3.0, 
then a random variable with density p is twice as likely to have a value “near” x\ 
than it it to have a value near X'i. The density p has two characteristics: 

p(x) > 0 (probability is non-negative), (14.2) 


C+oo 


p(x)dx 


1 (Probability (a: € K) 


!)■ 


(14.3) 


As an example, the canonical random variable £ takes on values between zero 
(inclusive) and one (non-inclusive) with uniform probability (here uniform simply 
means each value for £ is equally likely). This implies that the probability density 
function q for £ is 


< 7(0 = 


1 if 0 < £ < 1, 
0 otherwise, 


The space over which £ is defined is simply the interval [0,1). The probability 
that £ takes on a value in a certain interval [a, b] £ [0,1) is 


Probability(a <£<£>)= [ 1 dx = b — a. 


14.2.2 One-Dimensional Expected Value 

The average value that a real function / of a one-dimensional random variable 
with underlying pdf p will take on is called its expected value, E(f(x)) (some¬ 
times written Ef(x )): 


E(f(x)) = J f(x)p(x)dx. 

The expected value of a one-dimensional random variable can be calculated by 
setting f(x) = x. The expected value has a surprising and useful property: the 
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expected value of the sum of two random variables is the sum of the expected 
values of those variables: 


E(x + y) = E(x) + E(y), 


for random variables x and y. Because functions of random variables are them¬ 
selves random variables, this linearity of expectation applies to them as well: 


E{f(x) + g{y)) = E(f(x )) + E(g{y)). 


An obvious question to ask is whether this property holds if the random variables 
being summed are correlated (variables that are not correlated are called indepen¬ 
dent). This linearity property in fact does hold whether or not the variables are 
independent! This summation property is vital for most Monte Carlo applications. 

14.2.3 Multi-Dimensional Random Variables 

The discussion of random variables and their expected values extends naturally 
to multi-dimensional spaces. Most graphics problems will be in such higher¬ 
dimensional spaces. For example, many lighting problems are phrased on the 
surface of the hemisphere. Fortunately, if we define a measure p on the space the 
random variables occupy, everything is very similar to the one-dimensional case. 
Suppose the space S has associated measure //: for example S is the surface of 
a sphere and p measures area. We can define a pdf p : S i—> ]R, and if x is a 
random variable with x ~ p, then the probability that x will take on a value in 
some region Si C S is given by the integral 



Here Probability {event) is the probability that event is true, so the integral is the 
probability that x takes on a value in the region S*. 

In graphics, S is often an area {dp = dA = dxdy) or a set of directions (points 
on a unit sphere: dp = du> = sin d dddtjt). As an example, a two-dimensional 
random variable a is a uniformly distributed random variable on a disk of radius 
R. Here uniformly means uniform with respect to area, e.g., the way a bad dart 
player’s hits would be distributed on a dart board. Since it is uniform, we know 
that p{a) is some constant. From the fact that the area of the disk is nr 2 and that 
the total probability is one, we can deduce that 
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This means that the probability that a is in a certain subset S 1 of the disk is just 

Probability(cn £ Si) = [ -- dA. 

JSi kR 2 

This is all very abstract. To actually use this information, we need the integral in 
a form we can evaluate. Suppose Si is the portion of the disk closer to the center 
than the perimeter. If we convert to polar coordinates, then a is represented as 
a (r, 0) pair, and 5i is the region where r < R/2. Note, that just because a 
is uniform, it does not imply that 0 or r are necessarily uniform (in fact, 0 is 
uniform, and r is not uniform). The differential area dA is just r dr d(f>. Thus, 

Probability (r < — ] = [ [ - -rdrdcj) = 0.25. 

V 2 J Jo Jo 77 R 

The formula for expected value of a real function applies to the multi-dimensional 
case: 

E(f(x)) = f f(x)p(x)d/j,, 

Js 

where x £ S and / : S i—» R, and p : S i—> R. For example, on the unit square 
S = [0,1] x [0,1] and p(x , y ) = 4 xy, the expected value of the x coordinate for 

( x > y) is 

E{x)= [ f(x,y)p(x,y)dA 

Js 

= / 4 xr y dx dy 

Jo Jo 

_ 2 

~ 3' 

Note that here f(x, y) = x. 


14.2.4 Variance 

The variance, V(x), of a one-dimensional random variable is, by definition, the 
expected value of the square of the difference between x and E(x): 

V{x) = E{[x - E{x)f). 

Some algebraic manipulation gives the non-obvious expression: 

V{x) = E(x 2 ) - [E{x)f . 
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The expression E([x — E(x)] 2 ) is more useful for thinking intuitively about vari¬ 
ance, while the algebraically equivalent expression E(x 2 ) — [E(x)} is usually 


convenient for calculations. The variance of a sum of random variables is the 
sum of the variances if the variables are independent. This summation property 
of variance is one of the reasons it is frequently used in analysis of probabilistic 
models. The square root of the variance is called the standard deviation, a, which 
gives some indication of expected absolute deviation from the expected value. 

14.2.5 Estimated Means 

Many problems involve sums of independent random variables Xi , where the vari¬ 
ables share a common density p. Such variables are said to be independent identi¬ 
cally distributed (iid) random variables. When the sum is divided by the number 
of variables, we get an estimate of E(x): 



As N increases, the variance of this estimate decreases. We want N to be large 
enough so that we have confidence that the estimate is “close enough.” However, 
there are no sure things in Monte Carlo; we just gain statistical confidence that 
our estimate is good. To be sure, we would have to have N = oo. This confidence 
is expressed by the Law of Large Numbers : 



14.3 Monte Carlo Integration 


In this section, the basic Monte Carlo solution methods for definite integrals are 
outlined. These techniques are then straightforwardly applied to certain integral 
problems. All of the basic material of this section is also covered in several of the 
classic Monte Carlo texts. (See the Notes section at the end of this chapter.) 

As discussed earlier, given a function / : S i— > M and a random variable 
x ~ p, we can approximate the expected value of f(x) by a sum: 



(14.4) 
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Because the expected value can be expressed as an integral, the integral is also 
approximated by the sum. The form of Equation (14.4) is a bit awkward; we 
would usually like to approximate an integral of a single function g rather than a 
product fp. We can accomplish this by substituting g = fp as the integrand: 



(14.5) 


For this formula to be valid, p must be positive when g is nonzero. 

So to get a good estimate, we want as many samples as possible, and we want 
the g/p to have a low variance (g and p should have a similar shape). Choosing p 
intelligently is called importance sampling, because if p is large where g is large, 
there will be more samples in important regions. Equation (14.4) also shows 
the fundamental problem with Monte Carlo integration: diminishing return. Be¬ 
cause the variance of the estimate is proportional to 1/N, the standard deviation 
is proportional to 1/ y/N. Since the error in the estimate behaves similarly to the 
standard deviation, we will need to quadruple N to halve the error. 

Another way to reduce variance is to partition S, the domain of the integral, 
into several smaller domains S), and evaluate the integral as a sum of integrals 
over the 5). This is called stratified sampling, the technique that jittering employs 
in pixel sampling (Chapter 4). Normally only one sample is taken in each S, (with 
density pt), and in this case the variance of the estimate is: 



(14.6) 


It can be shown that the variance of stratified sampling is never higher than un¬ 
stratified if all strata have equal measure: 



The most common example of stratified sampling in graphics is jittering for pixel 
sampling as discussed in Section 13.4. 

As an example of the Monte Carlo solution of an integral I, set g(x) equal to 
x over the interval (0,4): 



(14.7) 


The impact of the shape of the function p on the variance of the N sample esti¬ 
mates is shown in Table 14.1. Note that the variance is reduced when the shape 
of p is similar to the shape of g. The variance drops to zero if p = g/I, but 
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Method 

Sampling function 

Variance 

Samples needed for 
standard error of 0.008 

importance 

(6 — ®)/(16) 

56.87V -1 

887,500 

importance 

1/4 

21.37V -1 

332,812 

importance 

(x + 2)/16 

6.37V -1 

98,437 

importance 

x/8 

0 

1 

stratified 

1/4 

21.37V -3 

70 


Table 14.1. Variance for Monte Carlo estimate of / 0 4 xdx. 


I is not usually known or we would not have to resort to Monte Carlo. One im¬ 
portant principle illustrated in Table 14.1 is that stratified sampling is often far 
superior to importance sampling (Mitchell, 1996). Although the variance for this 
stratification on I is inversely proportional to the cube of the number of samples, 
there is no general result for the behavior of variance under stratification. There 
are some functions for which stratification does no good. One example is a white 
noise function, where the variance is constant for all regions. On the other hand, 
most functions will benefit from stratified sampling, because the variance in each 
subcell will usually be smaller than the variance of the entire domain. 


14.3.1 Quasi-Monte Carlo Integration 

A popular method for quadrature is to replace the random points in Monte Carlo 
integration with quasi-random points. Such points are deterministic, but are in 
some sense uniform. For example, on the unit square [0,1] 2 , a set of TV quasi¬ 
random points should have the following property on a region of area A within 
the square: 

number of points in the region « AN. 

For example, a set of regular samples in a lattice has this property. 

Quasi-random points can improve performance in many integration applica¬ 
tions. Sometimes care must be taken to make sure that they do not introduce 
aliasing. It is especially nice that, in any application where calls are made to ran¬ 
dom or stratified points in [0, l] d , one can substitute (/-dimensional quasi-random 
points with no other changes. 

The key intuition motivating quasi-Monte Carlo integration is that when es¬ 
timating the average value of an integrand, any set of sample points will do, pro¬ 
vided they are “fair.” 
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14.4 Choosing Random Points 

We often want to generate sets of random or pseudorandom points on the unit 
square for applications such as distribution ray tracing. There are several methods 
for doing this, e.g., jittering (see Section 13.4). These methods give us a set of 
N reasonably equidistributed points on the unit square [0, l] 2 : (u±, V\) through 

(un, V N ). 

Sometimes, our sampling space may not be square (e.g., a circular lens), or 
may not be uniform (e.g, a filter function centered on a pixel). It would be nice if 
we could write a mathematical transformation that would take our equidistributed 
points (it,, Vi) as input and output a set of points in our desired sampling space 
with our desired density. For example, to sample a camera lens, the transformation 
would take ( Ui,Vi ) and output (r*, fa) such that the new points are approximately 
equidistributed on the disk of the lens. While we might be tempted to use the 
transform 


f Pi — 27TU,, 
n = ViR, 

it has a serious problem. While the points do cover the lens, they do so non- 
uniformly (Figure 14.6). What we need in this case is a transformation that takes 
equal-area regions to equal-area regions—one that takes uniform sampling distri¬ 
butions on the square to uniform distributions on the new domain. 

There are several ways to generate such non-uniform points or uniform points 
on non-rectangular domains, and the following sections review the three most 
often used: function inversion, rejection, and Metropolis. 


14.4.1 Function Inversion 

If the density f(x) is one-dimensional and defined over the interval 
x £ [x m i n , Xmax], then we can generate random numbers a* that have density 
/ from a set of uniform random numbers where £ [0,1]. To do this, we 
need the cumulative probability distribution function P(x ): 

Probability(a < x) = P{x) = f f{x')d[i. 

" ^min 

To get a;, we simply transform 



Figure 14.6. The trans¬ 
form that takes the horizon¬ 
tal and vertical dimensions 
uniformly to (r, <j>) does not 
preserve relative area; not 
all of the resulting areas are 
the same. 


a.i = P 1 (6)> 
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where P _1 is the inverse of P. If P is not analytically invertible, then numerical 
methods will suffice, because an inverse exists for all valid probability distribution 
functions. 

Note that analytically inverting a function is more confusing than it should be 
due to notation. For example, if we have the function 

V = x 2 , 

for x > 0, then the inverse function is expressed in terms of y as a function of x\ 

x = Vv- 

When the function is analytically invertible, it is almost always that simple. How¬ 
ever, things are a little more opaque with the standard notation: 

f(x) = x 2 , 
f~ 1 (x) = \fx. 


Here x is just a dummy variable. You may find it easier to use the less standard 
notation: 

y = x 2 , 

x = spy, 


while keeping in mind that these are inverse functions of each other. 
For example, to choose random points Xi that have density 


p(x ) = 


3a; 2 

~Y 


on [—1,1], we see that 


P(x) = 


x 3 + l 


2 


and 

P -1 (a;) = s/2x - 1, 

so we can “warp” a set of canonical random numbers (£i, • • • , f; ; y) to the properly 
distributed numbers 


{xi, ■ ■ ■ , x N ) = (\/2£i - 1, • • • , {/2£ n - 1). 


Of course, this same warping function can be used to transform “uniform” jittered 
samples into nicely distributed samples with the desired density. 
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If we have a random variable a = (a x ,a y ) with two-dimensional density 
(x, y) defined on [x min , x max ] x [y m i n , 2/max], then we need the two-dimensional 
distribution function: 

rV r* 

Probability (a x < x and a v < y) = F(x, y)= / f{x', y')dp(x', y'). 

" 2/min ^2-min 

We first choose an x, using the marginal distribution F(x. y max ) and then choose 
yi according to F(xi,y)/F(xi,y max ). If f(x,y) is separable (expressible as 
g(x)h(y)), then the one-dimensional techniques can be used on each dimension. 

Returning to our earlier example, suppose we are sampling uniformly from 
the disk of radius R, so p(r, (f>) = 1/(7 tR 2 ). The two-dimensional distribution 
function is 

r<t> o 

Probability(r < r 0 and <j> < <j>o) = F(r 0 ,<j> 0 ) = / 

Jo 

This means that a canonical pair (£i, £ 2 ) can be transformed to a uniform random 
point on the disk: 


f r ° rdrdcj) (j)r 2 

J 0 nR 2 = 2 itR 2 ' 


(j> = 

r = R\/l 2 - 

This mapping is shown in Figure 14.7. 

To choose reflected ray directions for some realistic rendering applications, 
we choose points on the unit hemisphere according to the density: 

77 -I- 1 

p(0,4>) = - cos n e. 

Where n is a Phong-like exponent, 6 is the angle from the surface normal and 6 £ 
[0, 7t/ 2] (is on the upper hemisphere) and (f> is the azimuthal angle (<j> £ [0, 27t]). 
The cumulative distribution function is 

P(0, cf>) = f* f p(0', 4>') sin O'dO'dxj)' • (14.8) 

Jo Jo 

The sin0' term arises because, on the sphere, dui = cos 9d0d(j). When the 
marginal densities are found, p (as expected) is separable, and we find that a 
(£ 1 , £ 2 ) pair of canonical random numbers can be transformed to a direction by 

9 = arccos ^(1 — , 

(j) = 2tt£ 2 - 



Figure 14.7. A map¬ 
ping that takes equal area 
regions in the unit square 
to equal area regions in the 
disk. 
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Again, a nice thing about this is that a set of jittered points on the unit square can 
be easily transformed to a set of jittered points on the hemisphere with the desired 
distribution. Note that if n is set to 1, we have a diffuse distribution, as is often 
needed. 

Often we must map the point on the sphere into an appropriate direction with 
respect to a uvw basis. To do this, we can first convert the angles to a unit vector a: 


(cos (f> sin 9, sin (f> sin 9, cos 6) 


As an efficiency improvement, we can avoid taking trigonometric functions of 
inverse trigonometric functions (e.g., cos (arccos#)). For example, when n = 1 
(a diffuse distribution), the vector a simplifies to 



14.4.2 Rejection 

A rejection method chooses points according to some simple distribution and re¬ 
jects some of them that are in a more complex distribution. There are several 
scenarios where rejection is used, and we show some of these by example. 

Suppose we want uniform random points within the unit circle. We can first 
choose uniform random points ( x , y) £ [—1, l] 2 and reject those outside the cir¬ 
cle. If the function r() returns a canonical random number, then the procedure 
is: 

done = false 
while (not done) do 

x = — 1 + 2 r() 

V = —1 + 2r() 

if (x 2 + y 2 < 1) then 

done = true 

If we want a random number x ~ p and we know that p : [a, b] i—> R, and 
that for all x, p(x) < to, then we can generate random points in the rectangle 
[a, b\ x [0, to] and take those where y < p(x): 

done = false 
while (not done) do 

x = a + r()(b — a) 

y = r()m 

if ( y < p(x)) then 

done = true 
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This same idea can be applied to take random points on the surface of a sphere. 
To pick a random unit vector with uniform directional distribution, we first pick a 
random point in the unit sphere and then treat that point as a direction vector by 
taking the unit vector in the same direction: 

done = false 
while (not done) do 

x = — 1 + 2 r() 

V = -1 + 2 r() 
z = —1 + 2 r() 

if ((Z = y/x 2 + y 2 + z 2 ) < 1) then 
done = true 

x = x/l 

y = y/i 

z = z/l 

Although the rejection method is usually simple to code, it is rarely compatible 
with stratification. For this reason, it tends to converge more slowly and should 
thus be used mainly for debugging, or in particularly difficult circumstances. 


14.4.3 Metropolis 

The Metropolis method uses random mutations to produce a set of samples with 
a desired density. This concept is used extensively in the Metropolis Light Trans¬ 
port algorithm referenced in the chapter notes. Suppose we have a random point 
.To in a domain S. Further, suppose for any point x, we have a way to generate 
random y ~ p x . We use the marginal notation p x (y) = p(x —> y) to denote this 
density function. Now, suppose we let x\ be a random point in S selected with 
underlying density p(xo —> xi). We generate X 2 with density p(x\ — > xo) and so 
on. In the limit, where we generate an infinite number of samples, it can be proved 
that the samples will have some underlying density determined by p regardless of 
the initial point xo . 

Now, suppose we want to choose p such that the underlying density of samples 
to which we converge is proportional to a function /(x) where / is a non-negative 
function with domain S. Further, suppose we can evaluate /, but we have little 
or no additional knowledge about its properties (such functions are common in 
graphics). Also, suppose we have the ability to make “transitions” from x* to 
Xj+i with underlying density function f(x* —> x,+i). To add flexibility, further 
suppose we add the potentially non-zero probability that x* transitions to itself, 
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i.e., Xi -(_i = Xi . We phrase this as generating a potential candidate y ~ t(xj — > y ) 
and “accepting” this candidate (i.e., Xj+i = y ) with probability a(xj — > y) and re¬ 
jecting it (i.e., Xi +1 = Xi) with probability 1 — a(xi —> y). Note that the sequence 
Xo, Xi, x 2 ,... will be a random set, but there will be some correlation among sam¬ 
ples. They will still be suitable for Monte Carlo integration or density estimation, 
but analyzing the variance of those estimates is much more challenging. 

Now, suppose we are given a transition function t(x —» y) and a function /(x) 
of which we want to mimic the distribution, can we use a(y — > x) such that the 
points are distributed in the shape of /? Or more precisely. 


{x 0 ,xi,x 2 , • • ■} 


I.f 


It turns out this can be forced by making sure the ;c, are stationary in some strong 
sense. If you visualize a huge collection of sample points x, you want the “flow” 
between two points to be the same in each direction. If we assume the density of 
points near x and y are proportional to /(x) and f(y ), respectively, then the flow 
in the two directions should be the same: 


flow(x -> y) = kf(x)t(x -> y)a(x -> y), 
How(y -> x) = kf(y)t(y -> x)a(y -> x), 


where k is some positive constant. Setting these two flows constant gives a con¬ 
straint on a: 

a(y _ /(x)f(x -> y) 

-> y) f(y)t(y -> x )' 

Thus, if either a(y —> x) or a(x —> y) is known, so is the other. Making them 
larger improves the chance of acceptance, so the usual technique is to set the 
larger of the two to 1. 

A difficulty in using the Metropolis sample generation technique is that it is 
hard to estimate how many points are needed before the set of points is “good.” 
Things are accelerated if the first n points are discarded, although choosing n 
wisely is non-trivial. 


14.4.4 Example: Choosing Random Lines in the Square 

As an example of the full process of designing a sampling strategy, consider the 
problem of finding random lines that intersect the unit square [0,1] 2 . We want 
this process to be fair; that is, we would like the lines to be uniformly distributed 
within the square. Intuitively, we can see that there is some subtlety to this prob¬ 
lem; there are “more” lines at an oblique angle than in horizontal or vertical di¬ 
rections. This is because the cross section of the square is not uniform. 
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Our first goal is to find a function-inversion method, if one exists, and then to 
fall back on rejection or Metropolis if that fails. This is because we would like 
to have stratified samples in line space. We try using normal coordinates first, 
because the problem of choosing random lines in the square is just the problem 
of finding uniform random points in whatever part of (r, 0) space corresponds to 
lines in the square. 

Consider the region where —7r/2 < 0 < 0. What values of r correspond to 
lines that hit the square? For those angles, r < cos 0 are all the lines that hit 
the square as shown in Figure 14.8. Similar reasoning in the other four quadrants 
finds the region in (r, 0) space that must be sampled, as shown in Figure 14.9. 
The equation of the boundary of that region r max (0)is 


^max 


( 0 ) 


TO if 6 G [-7T,—§], 

I cos 8 if 9 G [—f ,0], 

I \/2cos (9 - f) if 9 G [0, f], 
[sin# if 9 G [f ,7r]. 


Because the region under r max (0) is a simple function bounded below by r = 0, 
we can sample it by first choosing 9 according to the density function: 


p{9) 


?"max (A) 

r max (8)d9 


The denominator here is 4. Now, we can compute the cumulative probability 
distribution function: 


m 


ro if 0 g [- 71 -,—f], 

I (1 + sin0)/4 if 0 € [—f ,0], 

] (l + ^sin(0-f))/2 if 0 e [0, §], 

[(3 —cos0)/4 if 0 G [f ,7r]. 






Figure 14.8. The largest 
distance r corresponds to a 
line hitting the square for 
6 e [ - 7r/2, 0]. Because 
the square has sidelength 
one, r = cos 6. 


Figure 14.9. The maximum radius for lines hitting the unit square [0,1 ] 2 as a function of 9. 
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We can invert this by manipulating £1 = P(9) into the form 9 = <?(£i). This 
yields 


0 = 


arcsin(4£i — 1) 

< arcsin(^(2£i - 1)) + f 
arccos(3 — 4£i) 


if 6 < 

if Ci G [j, f], 
ifCi>|- 


Once we have 9, then r is simply: 


r = £ 2 r- max ( 0 ). 


As discussed earlier, there are many parameterizations of the line, and each has an 
associated “fair” measure. We can generate random lines in any of these spaces 
as well. For example, in slope-intercept space, the region that hits the square is 
shown in Figure 14.10. By similar reasoning to the normal space, the density 
function for the slope is 


p(m) 


1 + \m\ 
4 


with respect to the differential measure 


dp 


dm 

(1 + TO 2 )^ 
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This gives rise to the cumulative distribution function: 


P(m) 



m+1 

4\/l+m 2 

771 — 1 

4\/l+m 2 


if to < 0, 
if to > 0. 


These can be inverted by solving two quadratic equations. Given an m generated 
using £i, we then have 


b = 


(1 - to )£ 2 
-to + (1 + to )£ 2 


ife<i 

otherwise. 


This is not a better way than using normal coordinates; it is just an alternative 
way. 


Frequently Asked Questions 

• This chapter discussed probability but not statistics. What is the 
distinction? 

Probability is the study of how likely an event is. Statistics infers characteristics 
of large, but finite, populations of random variables. In that sense, statistics could 
be viewed as a specific type of applied probability. 

• Is Metropolis sampling the same as the Metropolis Light Transport 
Algorithm? 

No. The Metropolis Light Transport (Veach & Guibas, 1997) algorithm uses 
Metropolis sampling as part of its procedure, but it is specifically for rendering, 
and it has other steps as well. 


Notes 

The classic reference for geometric probability is Geometric Probability 
(Solomon, 1978). Another method for picking random edges in a square is given 
in Random-Edge Discrepancy of Supersampling Patterns (Dobkin & Mitchell, 
1993). More information on quasi-Monte Carlo methods for graphics can be 
found in Efficient Multidimensional Sampling (Kollig & Keller, 2002). Three 
classic and very readable books on Monte Carlo methods are Monte Carlo Meth¬ 
ods (Hammersley & Handscomb, 1964), Monte Carlo Methods, Basics (Kalos & 
Whitlock, 1986), and The Monte Carlo Method (Sobel et al., 1975). 
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Exercises 

1. What is the average value of the function xyz in the unit cube (x, y, z ) £ 

[ 0 , 1 ] 3 ? 

2. What is the average value of r on the unit-radius disk: (r, <fi) £ [0,1] x 
[0, 2tt)? 

3. Show that the uniform mapping of canonical random points (£i, £ 2 ) to the 
barycentric coordinates of any triangle is: /3 = 1 — \/l — £ 1 , and 7 = 

(1 - m)£ 2 - 

4. What is the average length of a line inside the unit square? Verify your 
answer by generating ten million random lines in the unit square and aver¬ 
aging their lengths. 

5. What is the average length of a line inside the unit cube? Verify your answer 
by generating ten million random lines in the unit cube and averaging their 
lengths. 

6. Show from the definition of variance that V{x) = E(x 2 ) — [E(x)} 2 . 



Michael Gleicher 



15 






Curves 


15.1 Curves 

Intuitively, think of a curve as something you can draw with a pen. The curve is 
the set of points that the pen traces over an interval of time. While we usually 
think of a pen writing on paper (e.g., a curve that is in a 2D space), the pen could 
move in 3D to generate a space curve , or you could imagine the pen moving in 
some other kind of space. 

Mathematically, definitions of curve can be seen in at least two ways: 

1. The continuous image of some interval in an n-dimensional space. 

2. A continuous map from a one-dimensional space to an n-dimensional space. 

Both of these definitions start with the idea of an interval range (the time over 
which the pen traces the curve). However, there is a significant difference: in 
the first definition, the curve is the set of points the pen traces (the image), while 
in the second definition, the curve is the mapping between time and that set of 
points. For this chapter, we use the first definition. 

A curve is an infinitely large set of points. The points in a curve have the 
property that any point has two neighbors, except for a small number of points 
that have one neighbor (these are the endpoints). Some curves have no endpoints, 
either because they are infinite (like a line) or they are closed (loop around and 
connect to themselves). 
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Because the “pen” of the curve is thin (infinitesimally), it is difficult to create 
filled regions. While space-filling curves are possible (by having them fold over 
themselves infinitely many times), we do not consider such mathematical oddities 
here. Generally, we think of curves as the outlines of things, not the “insides.” 

The problem that we need to address is how to specify a curve—to give a 
name or representation to a curve so that we can represent it on a computer. For 
some curves, the problem of naming them is easy since they have known shapes: 
line segments, circles, elliptical arcs, etc. A general curve that does not have a 
“named” shape is sometimes called a free-form curve. Because a free-form curve 
can take on just about any shape, they are much harder to specify. 

There are three main ways to specify curves mathematically: 

1. Implicit curve representations define the set of points on a curve by giving a 
procedure that can test to see if a point in on the curve. Usually, an implicit 
curve representation is defined by an implicit function of the form 

f{x,y) = 0, 

so that the curve is the set of points for which this equation is true. Note that 
the implicit function / is a scalar function (it returns a single real number). 

2. Parametric curve representations provide a mapping from a free parameter 
to the set of points on the curve. That is, this free parameter provides an 
index to the points on the curve. The parametric form of a curve is a func¬ 
tion that assigns positions to values of the free parameter. Intuitively, if you 
think of a curve as something you can draw with a pen on a piece of paper, 
the free parameter is time, ranging over the interval from the time that we 
began drawing the curve to the time that we finish. The parametric function 
of this curve tells us where the pen is at any instant in time: 

Om/) = f W- 

Note that the parametric function is a vector-valued function. This example 
is a 2D curve, so the output of the function is a 2-vector; in 3D it would be 
a 3-vector. 

3. Generative or procedural curve representations provide procedures that can 
generate the points on the curve that do not fall into the first two categories. 
Examples of generative curve descriptions include subdivision schemes and 
fractals. 

Remember that a curve is a set of points. These representations give us ways 
to specify those sets. Any curve has many possible representations. For this 
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reason, mathematicians typically are careful to distinguish between a curve and 
its representations. In computer graphics we are often sloppy, since we usually 
only refer to the representation, not the actual curve itself. So when someone says 
“an implicit curve,” they are either referring to the curve that is represented by 
some implicit function or to the implicit function that is one of the representations 
of some curve. Such distinctions are not usually important, unless we need to 
consider different representations of the same curve. We will consider different 
curve representations in this chapter, so we will be more careful. When we use a 
term like “polynomial curve,” we will mean the curve that can be represented by 
the polynomial. 

By the definition given at the beginning of the chapter, for something to be a 
curve it must have a parametric representation. However, many curves have other 
representations. For example, a circle in 2D with its center at the origin and radius 
equal to 1 can be written in implicit form as 

f(x,y) = x 2 + y 2 - 1=0, 


or in parametric form as 

(x,y) = f(i) = (cosf,sinf), t £ [0,27r). 

The parametric form need not be the most convenient representation for a given 
curve. In fact, it is possible to have curves with simple implicit or generative 
representations for which it is difficult to find a parametric representation. 

Different representations of curves have advantages and disadvantages. For 
example, parametric curves are much easier to draw, because we can sample the 
free parameter. Generally, parametric forms are the most commonly used in com¬ 
puter graphics since they are easier to work with. Our focus will be on parametric 
representations of curves. 


15.1.1 Parameterizations and Re-Parameterizations 

A parametric curve refers to the curve that is given by a specific parametric func¬ 
tion over some particular interval. To be more precise, a parametric curve has a 
given function that is a mapping from an interval of the parameters. It is often 
convenient to have the parameter run over the unit interval from 0 to 1. When the 
free parameter varies over the unit interval, we often denote the parameter as u. 

If we view the parametric curve to be a line drawn with a pen, we can consider 
u = 0 as the time when the pen is first set down on the paper and the unit of time 
to be the amount of time it takes to draw the curve (u = 1 is the end of the curve). 
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The curve can be specified by a function that maps time (in these unit coordinates) 
to positions. Basically, the specification of the curve is a function that can answer 
the question, “Where is the pen at time u?” 

If we are given a function f(f) that specifies a curve over interval [a, b], we 
can easily define a new function f 2 (w) that specifies the same curve over the unit 
interval. We can first define 


g(u) = a + (b — a)u, 


and then 

f 2 (u) = f 

The two functions, f and f 2 both represent the same curve; however, they pro¬ 
vide different parameterizations of the curve. The process of creating a new pa¬ 
rameterization for an existing curve is called re-parameterization, and the map¬ 
ping from old parameters to the new ones ( g , in this example) is called the re¬ 
parameterization function. 

If we have defined a curve by some parameterization, infinitely many oth¬ 
ers exist (because we can always re-parameterize). Being able to have multiple 
parameterizations of a curve is useful, because it allows us to create parameteriza¬ 
tions that are convenient. However, it can also be problematic, because it makes 
it difficult to compare two functions to see if they represent the same curve. 

The essence of this problem is more general: the existence of the free parame¬ 
ter (or the element of time) adds an invisible, potentially unknown element to our 
representation of the curves. When we look at the curve after it is drawn, we don’t 
necessarily know the timing. The pen might have moved at a constant speed over 
the entire time interval, or it might have started slowly and sped up. For example, 
while u = 0.5 is halfway through the parameter space, it may not be half-way 
along the curve if the motion of the pen starts slowly and speeds up at the end. 
Consider the following representations of a very simple curve: 


(x, y ) 

= f{u) = 

(u,u), 

Ou y) 

= f(u) = 

(n 2 ,u 2 ) 

{x,y) 

= f (u) = 

(u 5 ,u 5 ) 


All three functions represent the same curve on the unit interval; however when 
u is not 0 or 1, f (u) refers to a different point depending on the representation of 
the curve. 

If we are given a parameterization of a curve, we can use it directly as our 
specification of the curve, or we can develop a more convenient parameterization. 
Usually, the natural parameterization is created in a way that is convenient (or 
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natural) for specifying the curve, so we don't have to know about how the speed 
changes along the curve. 

If we know that the pen moves at a constant velocity, then the values of the 
free parameters have more meaning. Halfway through parameter space is halfway 
along the curve. Rather than measuring time, the parameter can be thought to 
measure length along the curve. Such parameterizations are called arc-length 
parameterizations because they define curves by functions that map from the dis¬ 
tance along the curve (known as the arc length) to positions. We often use the 
variable s to denote an arc length parameter. 

Technically, a parameterization is an arc-length parameterization if the mag¬ 
nitude of its tangent (that is, the derivative of the parameterization with respect to 
the parameter) has constant magnitude. Expressed as an equation, 


df(s) 


ds 


Computing the length along a curve can be tricky. In general, it is defined by 
the integral of the magnitude of the derivative (intuitively, the magnitude of the 
derivative is the velocity of the pen as it moves along the curve). So, given a value 
for the parameter v, you can compute s (the arc-length distance along the curve 
from the point f(0) to the point f (u)) as 



where f (i) is a function that defines the curve with a natural parameterization. 

Using the arc-length parameterization requires being able to solve Equation 
(15.1) for t, given s. For many of the kinds of curves we examine, it cannot be 
done in a closed-form (simple) manner and must be done numerically. 

Generally, we use the variable u to denote free parameters that range over the 
unit interval, s to denote arc-length free parameters, and t to represent parameters 
that aren’t one of the other two. 

15.1.2 Piecewise Parametric Representations 

For some curves, defining a parametric function that represents their shape is easy. 
For example, lines, circles, and ellipses all have simple functions that define the 
points they contain in terms of a parameter. For many curves, finding a function 
that specifies their shape can be hard. The main strategy that we use to create com¬ 
plex curves is divide-and-conquer: we break the curve into a number of simpler 
smaller pieces, each of which has a simple description. 








344 


15. Curves 




(a) 


(c) 


Figure 15.1. (a) A curve that can be easily represented as two lines; (b) a curve that can 

be easily represented as a line and a circular arc; (c) a curve approximating curve (b) with 
five line segments 

For example, consider the curves in Figure 15.1. The first two curves are 
easily specified in terms of two pieces. In the case of the curve in Figure 15.1(b), 
we need two different kinds of pieces: a line segment and a circle. 

To create a parametric representation of a compound curve (like the curve 
in Figure 15.1(b)), we need to have our parametric function switch between the 
functions that represent the pieces. If we define our parametric functions over the 
range 0 < u < 1, then the curve in Figures 15.1(a) or (b) might be defined as 



(15.2) 


where fi is a parameterization of the first piece, f 2 is a parameterization of the 
second piece, and both of these functions are defined over the unit interval. 

We need to be careful in defining the functions fi and f 2 to make sure that the 
pieces of the curve fit together. If fi(l) ^ f 2 (0), then our curve pieces will not 
connect and will not form a single continuous curve. 

To represent the curve in Figure 15.1(b), we needed to use two different types 
of pieces: a line segment and a circular arc. For simplicity’s sake, we may prefer 
to use a single type of piece. If we try to represent the curve in Figure 15.1(b) 
with only one type of piece (line segments), we cannot exactly recreate the curve 
(unless we use an infinite number of pieces). While the new curve made of line 
segments (as in Figure 15.1(c)) may not be exactly the same shape as in Fig¬ 
ure 15.1(b), it might be close enough for our use. In such a case, we might prefer 
the simplicity of using the simpler line segment pieces to having a curve that more 
accurately represents the shape. 

Also, notice that as we use an increasing number of pieces, we can get a better 
approximation. In the limit (using an infinite number of pieces), we can exactly 
represent the original shape. 
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One advantage to using a piecewise representation is that it allows us to make 
a tradeoff between 

1. how well our represented curve approximates the real shape we are trying 
to represent; 

2. how complicated the pieces that we use are; 

3. how many pieces we use. 

So, if we are trying to represent a complicated shape, we might decide that a 
crude approximation is acceptable and use a small number of simple pieces. To 
improve the approximation, we can choose between using more pieces and using 
more complicated pieces. 

In computer graphics practice, we tend to prefer using relatively simple curve 
pieces (either line segments, arcs, or polynomial segments). 

15.1.3 Splines 

Before computers, when draftsmen wanted to draw a smooth curve, one tool they 
employed was a stiff piece of metal that they would bend into the desired shape 
for tracing. Because the metal would bend, not fold, it would have a smooth 
shape. The stiffness meant that the metal would bend as little as possible to make 
the desired shape. This stiff piece of metal was called a spline. 

Mathematicians found that they could represent the curves created by a draft- 
man’s spline with piecewise polynomial functions. Initially, they used the term 
spline to mean a smooth, piecewise polynomial function. More recently, the term 
spline has been used to describe any piecewise polynomial function. We prefer 
this latter definition. 

For us, a spline is a piecewise polynomial function. Such functions are very 
useful for representing curves. 


15.2 Curve Properties 

To describe a curve, we need to give some facts about its properties. For “named” 
curves, the properties are usually specific according to the type of curve. For 
example, to describe a circle, we might provide its radius and the position of its 
center. For an ellipse, we might also provide the orientation of its major axis and 
the ratio of the lengths of the axes. For free-form curves however, we need to 
have a more general set of properties to describe individual curves. 
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Some properties of curves are attributed to only a single location on the curve, 
while other properties require knowledge of the whole curve. For an intuition of 
the difference, imagine that the curve is a train track. If you are standing on the 
track on a foggy day you can tell that the track is straight or curved and whether 
or not you are at an end point. These are local properties. You cannot tell whether 
or not the track is a closed curve, or crosses itself, or how long it is. We call this 
type of property, a global property. 

The study of local properties of geometric objects (curves and surfaces) is 
known as differential geometry. Technically, to be a differential property, there 
are some mathematical restrictions about the properties (roughly speaking, in the 
train-track analogy, you would not be able to have a GPS or a compass). Rather 
than worry about this distinction, we will use the term local property rather than 
differential property. 

Local properties are important tools for describing curves because they do not 
require knowledge about the whole curve. Local properties include 

• continuity, 

• position at a specific place on the curve, 

• direction at a specific place on the curve, 

• curvature (and other derivatives). 

Often, we want to specify that a curve includes a particular point. A curve is 
said to interpolate a point if that point is part of the curve. A function / interpo¬ 
lates a value v if there is some value of the parameter u for which /(f) = v. We 
call the place of interpolation, that is the value of t, the site. 


15.2.1 Continuity 

It will be very important to understand the local properties of a curve where two 
parametric pieces come together. If a curve is defined using an equation like 
Equation (15.2), then we need to be careful about how the pieces are defined. If 
fi (1) ^ f 2 (0) , then the curve will be “broken”—we would not be able to draw 
the curve in a continuous stroke of a pen. We call the condition that the curve 
pieces fit together continuity conditions because if they hold, the curve can be 
drawn as a continuous piece. Because our definition of ’’curve” at the beginning 
of the chapter requires a curve to be continuous, technically a ’’broken curve” is 
not a curve. 
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In addition to the positions, we can also check that the derivatives of the pieces 
match correctly. If f{(l) ^ ^(0), then the combined curve will have an abrupt 
change in its first derivative at the switching point; the first derivative will not 
be continuous. In general, we say that a curve is C n continuous if all of its 
derivatives up to n match across pieces. We denote the position itself as the 
zeroth derivative, so that the C° continuity condition means that the positions 
of the curve are continuous, and C 1 continuity means that positions and 
first derivatives are continuous. The definition of curve requires the curve to 
be C°. 

An illustration of some continuity conditions is shown in Figure 15.2. A dis¬ 
continuity in the first derivative (the curve is C° but not C 1 ) is usually noticeable 
because it displays a sharp corner. A discontinuity in the second derivative is 
sometimes visually noticeable. Discontinuities in higher derivatives might mat¬ 
ter, depending on the application. For example, if the curve represents a motion, 
an abrupt change in the second derivative is noticeable, so third derivative con¬ 
tinuity is often useful. If the curve is going to have a fluid flowing over it (for 
example, if it is the shape for an airplane wing or boat hull), a discontinuity in the 
fourth or fifth derivative might cause turbulence. 

The type of continuity we have just introduced ( C n ) is commonly referred to 
as parametric continuity as it depends on the parameterization of the two curve 
pieces. If the “speed” of each piece is different, then they will not be continuous. 
For cases where we care about the shape of the curve, and not its parameteriza¬ 
tion, we define geometric continuity that requires that the derivatives of the curve 
pieces match when the curves are parameterized equivalently (for example, us¬ 
ing an arc-length parameterization). Intuitively, this means that the corresponding 
derivatives must have the same direction, even if they have different magnitudes. 



Figure 15.2. An illustration of various types of continuity between two curve segments. 
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So, if the C 1 continuity condition is 

3 ( 1 ) = *2 ( 0 ), 

the G 1 continuity condition would be 

f((l) = fcf'(0), 

for some value of scalar k. Generally, geometric continuity is less restrictive 
than parametric continuity. A C n curve is also G n except when the parametric 
derivatives vanish. 

15.3 Polynomial Pieces 

The most widely used representations of curves in computer graphics is done 
by piecing together basic elements that are defined by polynomials and called 
polynomial pieces. For example, a line element is given by a linear polynomial. 
In Section 15.3.1, we give a formal definition and explain how to put pieces of 
polynomial together. 

15.3.1 Polynomial Notation 

Polynomials are functions of the form 

/(f) = ao + flif + ci 2 t 2 + ... + a n t n . (15.3) 

The a* are called the coefficients, and n is called the degree of the polynomial if 
a n f 0. We also write Equation (15.3) in the form 

n 

f(f) = ^a i f i . (15.4) 

i—0 

We call this the canonical form of the polynomial. 

We can generalize the canonical form to 

n 

f (t) = y^c ibj(t), (15.5) 

i =0 

where bft) is a polynomial. We can choose these polynomials in a convenient 
form for different applications, and we call them basis functions or blending 
functions (see Section 15.3.5). In Equation (15.4), the t l are the bi(t) of Equa¬ 
tion (15.5). If the set of basis functions is chosen correctly, any polynomial of 
degree n + 1 can be represented by an appropriate choice of c. 
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The canonical form does not always have convenient coefficients. For prac¬ 
tical purposes, throughout this chapter, we will find sets of basis functions such 
that the coefficients are convenient ways to control the curves represented by the 
polynomial functions. 

To specify a curve embedded in two dimensions, one can either specify two 
polynomials in t: one for how x varies with t and one for how y varies with f; 
or specify a single polynomial where each of the a, is a 2D point. An analogous 
situation exists for any curve in an n-dimensional space. 


15.3.2 A Line Segment 

To introduce the concepts of piecewise polynomial curve representations, we will 
discuss line segments. In practice, line segments are so simple that the mathemat¬ 
ical derivations will seem excessive. However, by understanding this simple case, 
things will be easier when we move on to more complicated polynomials. 

Consider a line segment that connects point po to pi. We could write the 
parametric function over the unit domain for this line segment as 

f(u) = (1 - u)po + upi- (15.6) 

By writing this in vector form, we have hidden the dimensionality of the points 
and the fact that we are dealing with each dimension separately. For example, 
were we working in 2D, we could have created separate equations: 

fx{u) = (l-u)x 0 + uxi, 

f y {u) = (l-u)y 0 + uyi. 

The line that we specify is determined by the two end points, but from now 
on we will stick to vector notation since it is cleaner. We will call the vector of 
control parameters, p, the control points, and each element of p, a control point. 

While describing a line segment by the positions of its endpoints is obvious 
and usually convenient, there are other ways to describe a line segment. For 
example, 

1. the position of the center of the line segment, the orientation, and the length; 

2. the position of one endpoint and the position of the second point relative to 
the first; 

3. the position of the middle of the line segment and one endpoint. 
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It is obvious that given one kind of a description of a line segment, we can switch 
to another one. 

A different way to describe a line segment is using the canonical form of the 
polynomial (as discussed in Section 15.3.1), 

f(u) = a 0 + tiai. (15.7) 

Any line segment can be represented either by specifying ao and ai or the end¬ 
points (po and pi). It is usually more convenient to specify the endpoints, because 
we can compute the other parameters from the endpoints. 

To write the canonical form as a vector expression, we define a vector u that 
is a vector of the powers of u: 

u = [l uu 2 u 3 ... u n ] , 

so that Equation (15.4) can be written as 

f(u) = u • a. (15.8) 

This vector notation will make transforming between different forms of the curve 
easier. 

Equation (15.8) describes a curve segment by the set of polynomial coeffi¬ 
cients for the simple form of the polynomial. We call such a representation the 
canonical form. We will denote the parameters of the canonical form by a. 

While it is mathematically simple, the canonical form is not always the most 
convenient way to specify curves. For example, we might prefer to specify a 
line segment by the positions of its endpoints. If we want to define po to be the 
beginning of the segment (where the segment is when u = 0) and pi to be the 
end of the line segment (where the line segment is at u = 1), we can write 

Po = f(0) = [1 0] ■ [a 0 ai], 

Pl = f(l) =[11]- [a 0 ai ]. 

We can solve these equations for ao and ai: 

a o = Po, 
ai = Pi - Po- 


Matrix Form for Polynomials 

While this first example was easy enough to solve, for more complicated examples 
it will be easier to write Equation (15.9) in the form 


Po 


' 1 

0 ' 


a 0 

. Pl . 


1 

1 


a i 
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Alternatively, we can write 

p = C a, (15.10) 

where we call C, the constraint matrix .' If having vectors of points bothers you, 
you can consider each dimension independently (so that p is [xo Xi] or [yo y\]) 
and a is handled correspondingly). 

We can solve Equation (15.10) for a by finding the inverse of C. This inverse 
matrix which we will denote by B is called the basis matrix. The basis matrix 
is very handy since it tells us how to convert between the convenient parameters 
p and the canonical form a and, therefore, gives us an easy way to evaluate the 
curve 

f(it) = u B p. 


We can find a basis matrix for whatever form of the curve that we want, providing 
that there are no non-linearities in the definition of the parameters. Examples of 
non-linearly defined parameters include the length and angle of the line segment. 

Now, suppose we want to parameterize the line segment so that po is the half¬ 
way point (it = 0.5), and pi is the ending point (it = 1). To derive the basis 
matrix for this parameterization, we set 


So 


and therefore 


Po = f(0.5) = 1 a 0 + 0.5 ai, 

Pi = f(l) = 1 a 0 + 1 ai. 



15.3.3 Beyond Line Segments 

Line segments are so simple that finding a basis matrix is trivial. However, it was 
good practice for curves of higher degree. First, let’s consider quadratics (curves 
of degree two). The advantage of the canonical form (Equation (15.4)) is that it 
works for these more complicated curves, just by letting n be a larger number. 

1 We assume the form of a vector (row or column) is obvious from the context, and we will skip all 
of the transpose symbols for vectors. 
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A quadratic (a degree-two polynomial) has three coefficients, ao, ai, and a 2 . 
These coefficients are not convenient for describing the shape of the curve. How¬ 
ever, we can use the same basis matrix method to devise more convenient param¬ 
eters. If we know the value of it, Equation (15.4) becomes a linear equation in the 
parameters, and the linear algebra from the last section still works. 

Suppose that we wanted to describe our curves by the position of the begin¬ 
ning (u = 0), middle 2 (u = 0.5), and end (u = 1). Entering the appropriate 
values into Equation (15.4): 


Po 

= f(0) 

— ao + 0 1 

ai 

+ O 2 

a 2 

Pi 

= f(0.5) 

= a 0 + 0.5 1 

ai 

+ 0.5 2 

a 2 

P2 

= f(l) 

= + l 1 

ai 

+ 1 2 

a 2 


So the constraint matrix is 


C = 


10 0 
1 .5 .25 

111 


and the basis matrix is 


b = cr 1 


10 0 
-3 4 -1 

2-4 2 


There is an additional type of constraint (or parameter) that is sometimes con¬ 
venient to specify: the derivative of the curve (with respect to its free parameter) 
at a particular value. Intuitively, the derivatives tell us how the curve is changing, 
so that the first derivative tells us what direction the curve is going, the second 
derivative tells us how quickly the curve is changing direction, etc. We will see 
examples of why it is useful to specify derivatives later. 

For the quadratic, 

f (u) = a 0 + aiu + a 2 u 2 , 

the derivatives are simple: 

, df 

f (u) = — = ai + 2a 2 u, 
du 


and 


f » 


£f_ 

du 2 



^Notice that this is the middle of the parameter space, which might not be the middle of the curve 
itself. 
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Or, more generally, 


f'(w) = EILi^ la ^ 
f»= E« <(<-!)«*" V 


For example, consider a case where we want to specify a quadratic curve 
segment by the position, first, and second derivative at its middle (u = 0.5). 


Po 

= f(0-5) 

— clod - 0.5 1 + 

0.5 2 

a 2 

Pi 

= f'(0.5) 

= ai + 

2 0.5 

a 2 

P2 

= f"(0.5) 

= 

2 

a 2 


The constraint matrix is 


C = 


1 .5 .25 

Oil 
0 0 2 


and the basis matrix is 


B = C" 1 


1 -.5 .125 

0 1 -.5 

0 0 .5 


15.3.4 Basis Matrices for Cubics 

Cubic polynomials are popular in graphics (See Section 15.5). The derivations 
for the various forms of cubics are just like the derivations we’ve seen already in 
this section. We will work through one more example for practice. 

A very useful form of a cubic polynomial is the Hermite form, where we 
specify the position and first derivative at the beginning and end, that is, 


Po = 

f(0) 

— ao + 

0 1 ai 

+ 

0 2 3-2 + 

0 3 a 3 , 

Pi = 

f'(0) 

= 

ai 

+2 

0 1 3-2 + 

3 0 2 a 3 , 

P2 = 

f(l) 

= a o + 

l 1 ai 

+ 

l 2 a2+ 

l 3 a 3 , 

P3 = 

f'(l) 

= 

ai 

+2 

l 1 a2+ 

3 l 2 a 3 . 
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Thus, the constraint matrix is 


10 0 0 
0 10 0 
1111 
0 12 3 


and the basis matrix is 


B = C 1 


10 0 0 
0 10 0 
-3 -2 3 -1 

2 1-21 


We will discuss Hermite cubic splines in Section 15.5.2. 


15.3.5 Blending Functions 

If we know the basis matrix, B, we can multiply it by the parameter vector, u, to 
get a vector of functions 

b(u) = u B. 

Notice that we denote this vector by b(u) to emphasize the fact that its value 
depends on the free parameter u. We call the elements of b(it) the blending func¬ 
tions, because they specify how to blend the values of the control point vector 
together: 

n 

f(u) = YM*)Pi- (i5.ii) 

i =0 

It is important to note that for a chosen value of u. Equation (15.11) is a linear 
equation specifying a linear blend (or weighted average) of the control points. 
This is true no matter what degree polynomials are “hidden” inside of the b, 
functions. 

Blending functions provide a nice abstraction for describing curves. Any type 
of curve can be represented as a linear combination of its control points, where 
those weights are computed as some arbitrary functions of the free parameter. 


15.3.6 Interpolating Polynomials 

In general, a polynomial of degree n can interpolate a set of n + 1 values. If 
we are given a vector p = (po,... ,p n ) of points to interpolate and a vector 
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t = (to,.. •, t n ) of increasing parameter values, f ^ tj, we can use the methods 
described in the previous sections to determine ann+lxn + 1 basis matrix that 
gives us a function /(f) such that f(U) = p t . For any given vector t, we need to 
set up and solve an n = 1 x n + 1 linear system. This provides us with a set of 
n + l basis functions that perform interpolation: 


f 0) = 

i=0 


These interpolating basis functions can be derived in other ways. One partic¬ 
ularly elegant way to define them is the Lagrange form: 


n 

*= n 

3=0,j^i 


f i tj 


(15.12) 


There are more computationally efficient ways to express the interpolating basis 
functions than the Lagrange form (see De Boor (1978) for details). 

Interpolating polynomials provide a mechanism for defining curves that in¬ 
terpolate a set of points. Figure 15.3 shows some examples. While it is possible 
to create a single polynomial to interpolate any number of points, we rarely use 
high-order polynomials to represent curves in computer graphics. Instead, inter¬ 
polating splines (piecewise polynomial functions) are preferred. Some reasons 
for this are considered in Section 15.5.3. 





(a) Interpolating polynomial through (b) Interpolating polynomial through (c) Interpolating polynomial through five and six points 
five points six points 


Figure 15.3. Interpolating polynomials through multiple points. Notice the extra wiggles 
and over-shooting between points. In (c), when the sixth point is added, it completely 
changes the shape of the curve due to the non-local nature of interpolating polynomials. 
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15.4 Putting Pieces Together 


Now that we've seen how to make individual pieces of polynomial curves, we can 
consider how to put these pieces together. 

15.4.1 Knots 

The basic idea of a piecewise parametric function is that each piece is only used 
over some parameter range. For example, if we want to define a function that 
has two piecewise linear segments that connect three points (as shown in Fig¬ 
ure 15.4(a)), we might define 



(15.13) 


where fi and f 2 are functions for each of the two line segments. Notice that 
we have re-scaled the parameter for each of the pieces to facilitate writing their 
equations as 


fi(u) = (1 - u)pi + UP2- 


For each polynomial in our piecewise function, there is a site (or parameter 
value) where it starts and ends. Sites where a piece function begins or ends are 
called knots. For the example in Equation (15.13), the values of the knots are 
0, 0.5, and 1. 

We may also write piecewise polynomial functions as the sum of basis func¬ 
tions, each scaled by a coefficient. For example, we can re-write the two line 
segments of Equation (15.13) as 


f(li) = Pl&l(u) + P2b 2 {u) + P3^3(m) 


(15.14) 



1 


0 


.5 


1 


bl(u) 


b2(u) - 


b3(u) . . . . 


Figure 15.4. (a) Two line segments connect three points; (b) the blending functions for each 
of the points are graphed at right. 
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where the function b\[u) is defined as 



0 otherwise, 


1 — 2it ifO<w<i, 


and &2 and are defined similarly. These functions are plotted in Figure 15.4(b). 

The knots of a polynomial function are the combination of the knots of all of 
the pieces that are used to create it. The knot vector is a vector that stores all of 
the knot values in ascending order. 

Notice that in this section we have used two different mechanisms for combin¬ 
ing polynomial pieces: using independent polynomial pieces for different ranges 
of the parameter and blending together piecewise polynomial functions. 

15.4.2 Using Independent Pieces 

In Section 15.3, we defined pieces of polynomials over the unit parameter range. 
If we want to assemble these pieces, we need to convert from the parameter of the 
overall function to the value of the parameter for the piece. The simplest way to 
do this is to define the overall curve over the parameter range [0, n] where n is the 
number of segments. Depending on the value of the parameter, we can shift it to 
the required range. 

15.4.3 Putting Segments Together 

If we want to make a single curve from two line segments, we need to make sure 
that the end of the first line segment is at the same location as the beginning of the 
next. There are three ways to connect the two segments (in order of simplicity): 

1. Represent the line segment as its two endpoints, and then use the same point 
for both. We call this a shared-point scheme. 

2. Copy the value of the end of the first segment to the beginning of the second 
segment every time that the parameters of the first segment change. We call 
this a dependency scheme. 

3. Write an explicit equation for the connection, and enforce it through nu¬ 
merical methods as the other parameters are changed. 

While the simpler schemes are preferable since they require less work, they also 
place more restrictions on the way the line segments are parameterized. For ex¬ 
ample, if we want to use the center of the line segment as a parameter (so that the 
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user can specify it directly), we will use the beginning of each line segment and 
the center of the line segment as their parameters. This will force us to use the 
dependency scheme. 

Notice that if we use a shared point or dependency scheme, the total number 
of control points is less than n * m, where n is the number of segments and m 
is the number of control points for each segment; many of the control points of 
the independent pieces will be computed as functions of other pieces. Notice 
that if we use either the shared-point scheme for lines (each segment uses its two 



Figure 15.5. A chain of line segments with local control and one with non-local control. 
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endpoints as parameters and shares interior points with its neighbors), or if we 
use the dependency scheme (such as the example one with the first endpoint and 
midpoint), we end up with n + 1 controls for an n-segment curve. 

Dependency schemes have a more serious problem. A change in one place in 
the curve can propagate through the entire curve. This is called a lack of locality. 
Locality means that if you move a point on a curve it will only effect a local 
region. The local region might be big, but it will be finite. If a curve’s controls do 
not have locality, changing a control point may effect points infinitely far away. 

To see locality, and the lack thereof, in action, consider two chains of line 
segments, as shown in Figure 15.5. One chain has its pieces parameterized by 
their endpoints and uses point-sharing to maintain continuity. The other has its 
pieces parameterized by an endpoint and midpoint and uses dependency propa¬ 
gation to keep the segments together. The two segment chains can represent the 
same curves: they are both a set of n connected line segments. However, because 
of locality issues, the endpoint-shared form is likely to be more convenient for the 
user. Consider changing the position of the first control point in each chain. For 
the endpoint-shared version, only the first segment will change, while all of the 
segments will be affected in the midpoint version, as in Figure 15.5. In fact, for 
any point moved in the endpoint-shared version, at most two line segments will 
change. In the midpoint version, all segments after the control point that is moved 
will change, even if the chain is infinitely long. 

In this example, the dependency propagation scheme was the one that did not 
have local control. This is not always true. There are direct sharing schemes that 
are not local and propagation schemes that are local. 

We emphasize that locality is a convenience of control issue. While it is in¬ 
convenient to have the entire curve change every time, the same changes can be 
made to the curve. It simply requires moving several points in unison. 


15.5 Cubics 

In graphics, when we represent curves using piecewise polynomials we usually 
use either line segments or cubic polynomials for the pieces. There are a number 
of reasons why cubics are popular in computer graphics: 

• Piecewise cubic polynomials allow for C 2 continuity, which is generally 
sufficient for most visual tasks. The C 1 smoothness that quadratics offer is 
often insufficient. The greater smoothness offered by higher-order polyno¬ 
mials is rarely important. 
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• Cubic curves provide the minimum-curvature interpolates to a set of points. 
That is, if you have a set of n + 3 points and define the “smoothest” curve 
that passes through them (that is the curve that has the minimum curvature 
over its length), this curve can be represented as a piecewise cubic with n 
segments. 

• Cubic polynomials have a nice symmetry where position and derivative can 
be specified at the beginning and end. 

• Cubic polynomials have a nice tradeoff between the numerical issues in 
computation and the smoothness. 

Notice that we do not have to use cubics. They just tend to be a good tradeoff 
between the amount of smoothness and complexity. Different applications may 
have different tradeoffs. We focus on cubics since they are the most commonly 
used. 

The canonical form of a cubic polynomial is 

f(u) = a 0 +ai« + a 2 « 2 + a 3 u 3 . 

As we discussed in Section 15.3, these canonical form coefficients are not a con¬ 
venient way to describe a cubic segment. 

We seek forms of cubic polynomials for which the coefficients are a conve¬ 
nient way to control the resulting curve represented by the cubic. One of the main 
conveniences will be to provide ways to insure the connectedness of the pieces 
and the continuity between the segments. 

Each cubic polynomial piece requires four coefficients or control points. That 
means for a piecewise polynomial with n pieces, we may require up to 4 n control 
points if no sharing between segments is done or dependencies used. More often, 
some part of each segment is either shared or depends on an adjacent segment, so 
the total number of control points is much lower. Also, note that a control point 
might be a position or a derivative of the curve. 

Unfortunately, there is no single “best” representation for a piecewise cubic. 
It is not possible to have a piecewise polynomial curve representation that has all 
of the following desirable properties: 

1. each piece of the curve is a cubic; 

2. the curve interpolates the control points; 

3. the curve has local control; 

4. the curve has C 2 continuity. 


15.5. Cubics 


361 


We can have any three of these properties, but not all four; there are repre¬ 
sentations that have any combination of three. In this book, we will discuss cubic 
B-splines that do not interpolate their control points (but have local control and 
are C 2 ); Cardinal splines and Catmull-Rom splines that interpolate their control 
points and offer local control, but are not C 2 ; and natural cubics that interpolate 
and are C 1 , but do not have local control. 

The continuity properties of cubics refer to the continuity between the seg¬ 
ments (at the knot points). The cubic pieces themselves have infinite continuity 
in their derivatives (the way we have been talking about continuity so far). Note 
that if you have a lot of control points (or knots), the curve can be wiggly, which 
might not seem “smooth.” 


15.5.1 Natural Cubics 


With a piecewise cubic curve, it is possible to create a C 2 curve. To do this, we 
need to specify the position and first and second derivative at the beginning of 
each segment (so that we can make sure that it is the same as at the end of the 
previous segment). Notice, that each curve segment receives three out of its four 
parameters from the previous curve in the chain. These C 2 continuous chains of 
cubics are sometimes referred to as natural cubic splines. 

For one segment of the natural cubic, we need to parameterize the cubic by 
the positions of its endpoints and the first and second derivative at the beginning 
point. The control points are therefore 


p 0 = f(0) = a 0 + 

Pi = f'(0) = 

P2= f"(0) = 

P.3 = f(l) =a 0 + 

Therefore, the constraint matrix is 


C = 


and the basis matrix is 


b = cr 1 = 


0 1 a i 

+ 

0 2 a 2 

+ 

O 3 

a 3 

l 1 ai 

+2 

0 1 a 2 

+3 

o 2 

a 3 


2 

l 1 a 2 

+6 

o 1 

a 3 

l 1 ai 

+ 

l 2 a 2 

+ 

l 3 

a 3 


1 

0 

0 

0 ' 

0 

1 

0 

0 

0 

0 

2 

0 

1 

1 

1 

1 


1 

0 

0 

0 ' 

0 

1 

0 

0 

0 

0 

.5 

0 

-1 

-1 

-.5 

1 
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Given a set of n control points, a natural cubic spline has n— 1 cubic segments. 
The first segment uses the control points to define its beginning position, ending 
position, and first and second derivative at the beginning. A dependency scheme 
copies the position, and first and second derivative of the end of the first segment 
for use in the second segment. 

A disadvantage of natural cubic splines is that they are not local. Any change 
in any segment may require the entire curve to change (at least the part after 
the change was made). To make matters worse, natural cubic splines tend to be 
ill-conditioned: a small change at the beginning of the curve can lead to large 
changes later. Another issue is that we only have control over the derivatives of 
the curve at its beginning. Segments after the beginning of the curve determine 
their derivatives from their beginning point. 


15.5.2 Hermite Cubics 

Hermite cubic polynomials were introduced in Section 15.3.4. A segment of a 
cubic Hermite spline allows the positions and first derivatives of both of its end 
points to be specified. A chain of segments can be linked into a C 1 spline by 
using the same values for the position and derivative of the end of one segment 
and for the beginning of the next. 

Given a set of n control points, where every other control point is a derivative 
value, a cubic Hermite spline contains (n — 2 )/2 cubic segments. The spline inter¬ 
polates the points, as shown in Figure 15.6,but can guarantee only C 1 continuity. 

Hermite cubics are convenient because they provide local control over the 
shape, and provide C 1 continuity. However, since the user must specify both po¬ 
sitions and derivatives, a special interface for the derivatives must be provided. 
One possibility is to provide the user with points that represent where the deriva¬ 
tive vectors would end if they were “placed” at the position point. 



Figure 15.6. A Hermite cubic spline made up of three segments. 
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15.5.3 Cardinal Cubics 


A cardinal cubic spline is a type of C 1 interpolating spline made up of cubic 
polynomial segments. Given a set of n control points, a cardinal cubic spline uses 
n — 2 cubic polynomial segments to interpolate all of its points except for the first 
and last. 

Cardinal splines have a parameter called tension that controls how “tight” the 
curve is between the points it interpolates. The tension is a number in the range 
[0,1) that controls how the curve bends towards the next control point. For the 
important special case of t = 0, the splines are called Catmull-Rom splines. 

Each segment of the cardinal spline uses four control points. For segment i, 
the points used are i, i + 1, i + 2, and i + 3 as the segments share three points 
with their neighbors. Each segment begins at its second control point and ends at 
its third control point. The derivative at the beginning of the curve is determined 
by the vector between the first and third control points, while the derivative at the 
end of the curve is given by the vector between the second and forth points, as 
shown in Figure 15.7. 

The tension parameter adjusts how much the derivatives are scaled. Specif¬ 
ically, the derivatives are scaled by (1 — t)/2. The constraints on the cubic are 
therefore 

f(0) = P2, 

f(l) = P.3, 

f'(0) = 5(1 -*)(P3 “Pi), 

f'(!) = I(l-i)(P4-P2). 

Solving these equations for the control points (defining s = (1 — t)/2) gives 



Figure 15.7. A segment of 
a cardinal cubic spline inter¬ 
polates its second and third 
control points (p 2 and p 3 ), 
and uses its other points to 
determine the derivatives at 
the beginning and end. 


Po = 

f(l) - tV(0) 

= a 0 

+(!-*) 

ai 

+ 

a 2 

+ 

a 3, 

Pi = 

f(0) 

= a 0; 







P2 = 

f(l) 

= a 0 

+ 

ai 

+ 

a 2 

+ 

a 3, 

P3 = 

f(0) + if'(i) 

= a 0 

+ * 

ai 

+ 2 i 

a 2 

+3l 

a 3 . 


This yields the cardinal matrix 


0 1 0 0 

—s 0 s 0 

2s s — 3 3 — 2s —s 

—s 2 — s s — 2 s 


Since the third point of segment i is the second point of segment z + 1, adjacent 
segments of the cardinal spline connect. Similarly, the same points are used to 
specify the first derivative of each segment, providing C 1 continuity. 
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Figure 15.8. Cardinal splines through seven control points with varying values of tension 
parameter t. 

Cardinal splines are useful, because they provide an easy way to interpolate 
a set of points with C 1 continuity and local control. They are only C 1 , so they 
sometimes get “kinks” in them. The tension parameter gives some control over 
what happens between the interpolated points, as shown in Figure 15.8, where a 
set of cardinal splines through a set of points is shown. The curves use the same 
control points, but they use different values for the tension parameters. Note that 
the first and last control points are not interpolated. 

Given a set of n points to interpolate, you might wonder why we might prefer 
to use a cardinal cubic spline (that is a set of n — 2 cubic pieces) rather than a sin¬ 
gle, order n polynomial as described in Section 15.3.6. Some of the disadvantages 
of the interpolating polynomial are: 

• The interpolating polynomial tends to overshoot the points, as seen in Fig¬ 
ure 15.9. This overshooting gets worse as the number of points grows 
larger. The cardinal splines tend to be well behaved in between the points. 

• Control of the interpolating polynomial is not local. Changing a point at the 
beginning of the spline affects the entire spline. Cardinal splines are local: 
any place on the spline is affected by its four neighboring points at most. 

• Evaluation of the interpolating polynomial is not local. Evaluating a point 
on the polynomial requires access to all of its points. Evaluating a point 
on the piecewise cubic requires a fixed small number of computations, no 
matter how large the total number of points is. 

There are a variety of other numerical and technical issues in using interpolating 
splines as the number of points grows larger. See (De Boor, 2001) for more 
information. 

A cardinal spline has the disadvantage that it does not interpolate the first or 
last point, which can be easily fixed by adding an extra point at either end of 
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Figure 15.9. Splines interpolating nine control points (marked with small crosses). The 
thick gray line shows an interpolating polynomial. The thin, dark line shows a Catmull-Rom 
spline. The latter is made of seven cubic segments, which are each shown in alternating 
gray tones. 


the sequence. The cardinal spline also is not as continuous—providing only C 1 
continuity at the knots. 


15.6 Approximating Curves 

It might seem like the easiest way to control a curve is to specify a set of points 
for it to interpolate. In practice, however, interpolation schemes often have unde¬ 
sirable properties because they have less continuity and offer no control of what 
happens between the points. Curve schemes that only approximate the points are 
often preferred. With an approximating scheme, the control points influence the 
shape of the curve, but do not specify it exactly. Although we give up the ability 
to directly specify points for the curve to pass through, we gain better behavior 
of the curve and local control. Should we need to interpolate a set of points, the 
positions of the control points can be computed such that the curve passes through 
these interpolation points. 

The two most important types of approximating curves in computer graphics 
are Bezier curves and B-spline curves. 


15.6.1 Bezier Curves 

Bezier curves are one of the most common representations for free-form curves 
in computer graphics. The curves are named for Pierre Bezier, one of the people 
who was instrumental in their development. Bezier curves have an interesting 
history where they were concurrently developed by several independent groups. 

A Bezier curve is a polynomial curve that approximates its control points. The 
curves can be a polynomial of any degree. A curve of degree d is controlled by 
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d + 1 control points. The curve interpolates its first and last control points, and 
the shape is directly influenced by the other points. 

Often, complex shapes are made by connecting a number of Bezier curves of 
low degree, and in computer graphics, cubic (d = 3) Bezier curves are commonly 
used for this purpose. Many popular illustration programs, such as Adobe Illus¬ 
trator, and font representation schemes, such as that used in Postscript, use cubic 
Bezier curves. Bezier curves are extremely popular in computer graphics because 
they are easy to control, have a number of useful properties, and there are very 
efficient algorithms for working with them. 

Bezier curves are constructed such that: 

• The curve interpolates the first and last control points, with u = 0 and 1, 
respectively. 

• The first derivative of the curve at its beginning (end) is determined by the 
vector between the first and second (next to last and last) control points. 
The derivatives are given by the vectors between these points scaled by the 
degree of the curve. 

• Higher derivatives at the beginning (end) of the curve depend on the points 
at the beginning (end) of the curve. The n th derivative depends on the first 
(last) n + 1 points. 

For example, consider the Bezier curve of degree 3 as in Figure 15.10. The 
curve has four (d + 1) control points. It begins at the first control point (po) 
and ends at the last (pi). The first derivative at the beginning is proportional to 
the vector between the first and second control points (pi — po). Specifically, 
f'(0) = 3(pi — po). Similarly, the first derivative at the end of the curve is given 


Pi 



Figure 15.10. A cubic Bezier curve is controlled by four points. It interpolates the first and 
last, and the beginning and final derivatives are three times the vectors between the first two 
(or last two) points. 
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by f'(l) = 3 (p 3 — P 2 ). The second derivative at the beginning of the curve can 
be determined from control points po, pi and P 2 . 

Using the facts about Bezier cubics in the preceding paragraph, we can use the 
methods of Section 15.5 to create a parametric function for them. The definitions 
of the beginning and end interpolation and derivatives give 


Po = 
P3 = 

3(pi - Po) = 
3(P3 - P 2 ) = 


f(0) = 33 0 3 + a20 2 + aiO + ao, 
f(l) = a3l 3 + a2l 2 + ail + ao, 
f / (0) = 3a30“ + 2a20 + ai, 

f' ( 1 ) = 3a3l“ + 2a2l + ai. 


This can be solved for the basis matrix 

'10 0 0 



-1 3-31 


and then written as 


f (u) = (1 — 3m + 3u 2 —u 3 )po + (3u — 6u 2 + 3u 3 )pi + (3m 2 —3it 3 )p2 + (u 3 )p 3 , 


f ( M ) = ^2 b i,3Pi, 

i =0 

where the 6^3 are the Bezier blending functions of degree 3: 

bo ,3 = (1 - u) 3 , 

bi,3= 3u(l — m) 2 , 

& 2,3 = 3u 2 (l — u), 

b 3,3 = U 3 . 

Fortunately, the blending functions for Bezier curves have a special form that 
works for all degrees. These functions are known as the Bernstein basis polyno¬ 
mials and have the general form 

b k ,n(u) = C(n,k)u k (l- U )(”- fc >, 


where n is the order of the Bezier curve, and k is the blending function number 
between 0 and n (inclusive). C(n. k) are the binomial coefficients: 


C{n,k ) = 


k\ (n — k)\ 
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Figure 15.11. Various Bezier segments of degree 2-6. The control points are shown with 
crosses, and the control polygons (line segments connecting the control points) are also 
shown. 


Given the positions of the control points p/,, the function to evaluate the Bezier 
curve of order n (with n + 1 control points) is 

n 

p(u) = Y,p k C{n,k) u k (1- u ) (n ~ fc) . 

k =0 

Some Bezier segments are shown in Figure 15.11. 

Bezier segments have several useful properties: 

• The curve is bounded by the convex hull of the control points. 

• Any line intersects the curve no more times than it intersects the set of 
line segments connecting the control points. This is called the variation 
diminishing property. This property is illustrated in Figure 15.12. 

• The curves are symmetric: reversing the order of the control points yields 
the same curve, with a reversed parameterization. 

• The curves are affine invariant. This means that translating, scaling, rotat¬ 
ing, or skewing the control points is the same as performing those opera¬ 
tions on the curve itself. 

• There are good simple algorithms for evaluating and subdividing Bezier 
curves into pieces that are themselves Bezier curves. Because subdivision 
can be done effectively using the algorithm described later, a divide and 
conquer approach can be used to create effective algorithms for important 
tasks such as rendering Bezier curves, approximating them with line seg¬ 
ments, and determining the intersection between two curves. 
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Figure 15.12. The variation diminishing property of Bezier curves means that the curve 
does not cross a line more than its control polygon does. Therefore, if the control polygon 
has no “wiggles,” the curve will not have them either. B-splines (Section 15.6.2) also have 
this property. 


When Bezier segments are connected together to make a spline, connectivity be¬ 
tween the segments is created by sharing the endpoints. However, continuity 
of the derivatives must be created by positioning the other control points. This 
provides the user of a Bezier spline with control over the smoothness. For G 1 
continuity, the second-to-last point of the first curve and the second point of the 
second curve must be collinear with the equated endpoints. For G 1 continu¬ 
ity, the distances between the points must be equal as well. This is illustrated 
in Figure 15.13. Higher degrees of continuity can be created by properly posi¬ 
tioning more points. 

Geometric Intuition for Bezier Curves 

Bezier curves can be derived from geometric principles, as well as from the alge¬ 
braic methods described above. We outline the geometric principles because they 
provides intuition on how Bezier curves work. 

Imagine that we have a set of control points from which we want to create 
a smooth curve. Simply connecting the points with lines (to form the control 
polygon) will lead to something that is non-smooth. It will have sharp corners. We 
could imagine “smoothing” this polygon by cutting off the sharp corners, yielding 
a new polygon that is smoother, but still not “smooth” in the mathematical sense 
(since the curve is still a polygon, and therefore only C 1 . We can repeat this 
process, each time yielding a smoother polygon, as shown in Figure 15.14. In the 
limit, that is if we repeated the process infinitely many times, we would obtain a 
C 1 smooth curve. 

What we have done with corner cutting is defining a subdivision scheme. That 
is, we have defined curves by a process for breaking a simpler curve into smaller 
pieces (e.g., subdividing it). The resulting curve is the limit curve that is achieved 



Figure 15.13. Two Bezier 
segments connect to form 
a C 1 spline, because the 
vector between the last two 
points of the first segment 
is equal to the vector be¬ 
tween the first two points of 
the second segment. 
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Figure 15.14. Subdivision procedure for quadratic Beziers. Each line segment is divided in 
half and these midpoints are connected (gray points and lines). The interior control point is 
moved to the midpoint of the new line segment (white circle). 


by applying the process infinitely many times. If the subdivision scheme is de¬ 
fined correctly, the result will be a smooth curve, and it will have a parametric 
form. 

Let us consider applying corner cutting to a single corner. Given three points 
(po, pi, P 2 ), we repeatedly “cut off the corners” as shown in Figure 15.15. At 
each step, we divide each line segment in half, connect the midpoints, and then 
move the corner point to the midpoint of the new line segment. Note that in this 
process, new points are introduced, moved once, and then remain in this position 
for any remaining iterations. The endpoints never move. 

If we compute the “new” position for P 2 as the midpoint of the midpoints, we 
get the expression 


/ M 1 , 1,1 1 , 

P! = 2 ( 2 P " + 2 Pl, + 2 ( 2 Pl + 2 K) ' 


The construction actually works for other proportions of distance along each 
segment. If we let u be the distance between the beginning and the end of each 



Figure 15.15. By repeatedly cutting the corners off a polygon, we approach a smooth curve. 
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segment where we place the middle point, we can re-write this expression as 
p(u) = (1 - u )((1 - w)po + wpi) + u{{ 1 - u)pi + up 2 ). 
Regrouping terms gives the quadratic Bezier function: 

B 2 (it) = (1 - w) 2 po + 2u(l - u)pi + M 2 p 2 . 

The De Casteljau Algorithm 

One nice feature of Bezier curves is that there is a very simple and general method 
for computing and subdividing them. The method, called the de Casteljau algo¬ 
rithm, uses a sequence of linear interpolations to compute the positions along the 
Bezier curve of arbitrary order. It is the generalization of the subdivision scheme 
described in the previous section. 

The de Casteljau algorithm begins by connecting every adjacent set of points 
with lines, and finding the point on these lines that is the u interpolation, giving a 
set of n— 1 points. These points are then connected with straight lines, those lines 
are interpolated (again by u ), giving a set of n — 2 points. This process is repeated 
until there is one point. An illustration of this process is shown in Figure 15.16. 

The process of computing a point on a Bezier segment also provides a method 
for dividing the segment at the point. The intermediate points computed during 
the de Casteljau algorithm form the new control points of the new, smaller seg¬ 
ments, as shown in Figure 15.17. 

The existence of a good algorithm for dividing Bezier curves makes divide- 
and-conquer algorithms possible. For example, when drawing a Bezier curve 
segment, it is easy to check if the curve is close to being a straight line because it is 
bounded by its convex hull. If the control points of the curve are all close to being 
co-linear, the curve can be drawn as a straight line. Otherwise, the curve can be 



Figure 15.16. An illustration of the de Casteljau algorithm for a cubic Bezier. The left-hand 
image shows the construction for u = 0.5. The right-hand image shows the construction for 
0.25, 0.5, and 0.75. 
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Figure 15.17. The de Casteljau algorithm is used to subdivide a cubic Bezier segment. 
The initial points (black diamonds A, B, C, and D) are linearly interpolated to yield gray circles 
(AB, BC, CD), which are linearly interpolated to yield white circles (AC, BD), which are linearly 
interpolated to give the point on the cubic AD. This process also has subdivided the Bezier 
segment with control points A,B,C,D into two Bezier segments with control points A, AB, AC, 
AD and AD, BD, CD, D. 


divided into smaller pieces, and the process can be repeated. Similar algorithms 
can be used for determining the intersection between two curves. Because of the 
existence of such algorithms, other curve representations are often converted to 
Bezier form for processing. 

15.6.2 B-splines 

B-splines provide a method for approximating a set of n points with a curve made 
up of polynomials of degree d that gives continuity. Unlike the Bezier 

splines of the previous section, B-splines allow curves to be generated for any 
desired degree of continuity (almost up to the number of points). Because of 
this, B-splines are a preferred way to specify very smooth curves (high degrees 
of continuity) in computer graphics. If we want a C 2 or higher curve through an 
arbitrary number of points, B-splines are probably the right method. 

We can represent a curve using a linear combination of B-spline basis func¬ 
tions. Since these basis functions are themselves splines, we call them basis 
splines or B-splines for short. Each B-spline or basis function is made up of a 
set of d + 1 polynomials each of degree d. The methods of B-splines provide 
general procedures for defining these functions. 

The term B-spline specifically refers to one of the basis functions, not the 
function created by the linear combination of a set of B-splines. However, there 
is inconsistency in how the term is used in computer graphics. Commonly, a “B- 
spline curve” is used to mean a curve represented by the linear combination of 
B-splines. 

The idea of representing a polynomial as the linear combination of other poly¬ 
nomials has been discussed in Section 15.3.1 and 15.3.5. Representing a spline 
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as a linear combination of other splines was shown in Section 15.4.1. In fact, the 
example given is a simple case of a B-spline. 

The general notation for representing a function as a linear combination of 
other functions is 

n 

f (i) = ^Z p AW. (15.15) 

i=1 

where the p, are the coefficients and the b, are the basis functions. If the coeffi¬ 
cients are points (e.g. 2 or 3 vectors), we refer to them as control points. The key 
to making such a method work is to define the bi appropriately. B-splines provide 
a very general way to do this. 

A set of B-splines can be defined for a number of coefficients n and a param¬ 
eter value k? The value of k is one more than the degree of the polynomials used 
to make the B-splines (k = d + 1.) 

B-splines are important because they provide a very general method for cre¬ 
ating functions (that will be useful for representing curves) that have a number 
of useful properties. A curve with n points made with B-splines with parameter 
value k : 

• is C( k ~ 2 ' 1 continuous; 

• is made of polynomials of degree k — 1; 

• has local control—any site on the curve only depends on k of the control 
points; 

• is bounded by the convex hull of the points; 

• exhibits the variation diminishing property illustrated in Figure 15.12. 

A curve created using B-splines does not necessarily interpolate its control points. 

We will introduce B-splines by first looking at a specific, simple case to in¬ 
troduce the concepts. We will then generalize the methods and show why they 
are interesting. Because the method for computing B-splines is very general, we 
delay introducing it until we have shown what these generalizations are. 

3 The B-spline parameter is actually the order of the polynomials used in the B-splines. While this 
terminology is not uniform in the literature, the use of the B-spline parameter k as a value one greater 
than the polynomial degree is widely used, although some texts (see the chapter notes) write all of the 
equations in terms of polynomial degree. 
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Uniform Linear B-splines 

Consider a set of basis functions of the following form: 

{ t — i if i < t < i + 1 , 

2 — / / ifi + l<f<* + 2, (15.16) 

0 otherwise. 

Each of these functions looks like a little triangular “hat” between i and i + 2 with 
its peak at i + 1. Each is a piecewise polynomial, with knots at i, i + 1, and i + 2. 
Two of them are graphed in Figure 15.18. 

Each of these functions 6^2 is a first degree (linear) B-spline. Because we will 
consider B-splines of other parameter values later, we denote these with the 2 in 
the subscript. 

Notice that we have chosen to put the lower edge of the B-spline (its first knot) 
at i. Therefore the first knot of the first B-spline (i = 1) is at 1. Iteration over the 
B-splines or elements of the coefficient vector is from 1 to n (see Equation 15.15). 
When B-splines are implemented, as well as in many other discussions of them, 
they often are numbered from 0 to n — 1 . 

We can create a function from a set of n control points using Equation 15.15, 
with these functions used for the 6 * to create an “overall function” that was influ¬ 
enced by the coefficients. If we were to use these ( k = 2) B-splines to define the 
overall function, we would define a piecewise polynomial function that linearly 
interpolates the coefficients p, between t = k and t = n + 1. Note that while 
(k = 2) B-splines interpolate all of their coefficients, B-splines of higher degree 
do this under some specific conditions that we will discuss in Section 15.6.3. 

Some properties of B-splines can be seen in this simple case. We will write 
these in the general form using k, the parameter, and n for the number of coeffi¬ 
cients or control points. 

• Each B-spline has k + 1 knots. 

• Each B-spline is zero before its first knot and after its last knot. 
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• The overall spline has local control because each coefficient is only mul¬ 
tiplied by one B-spline, and this B-spline is non-zero only between k + 1 
knots. 

• The overall spline has n + k knots. 

• Each B-spline is C"' 1 '^ 2 ' continuous, therefore the overall spline is C ( k ~2) 
continuous. 

• The set of B-splines sums to 1 for all parameter values between knots k 
and n + 1. This range is where there are k B-splines that are non-zero. 
Summing to 1 is important because it means that the B-splines are shift 
invariant: translating the control points will translate the entire curve. 

• Between each of its knots, the B-spline is a single polynomial of degree 
d = k — 1. Therefore, the overall curve (that sums these together) can also 
be expressed as a single, degree d polynomial between any adjacent knots. 

In this example, we have chosen the knots to be uniformly spaced. We will con¬ 
sider B-splines with non-uniform spacing later. When the knot spacing is uniform, 
each of the B-splines are identical except for being shifted. B-splines with uni¬ 
form knot spacing are sometimes called uniform B-splines or periodic B-splines. 

Uniform Quadratic B-splines 

The properties of B-splines listed in the previous section were intentionally writ¬ 
ten for arbitrary n and k. A general procedure for constructing the B-splines will 
be provided later, but first, lets consider another specific case with k — 3. 

The B-spline 62,3 is shown in Figure 15.19. It is made of quadratic pieces 
(degree 2), and has 3 of them. It is C 1 continuous and is non-zero only within 
the 4 knots that it spans. Notice that a quadratic B-spline is made of 3 pieces, 
one between knot 1 and 2, one between knot 2 and 3, and one between knot 3 



Figure 15 . 19 . The B-spline h 2 ,3 with uniform knot spacing. 
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Figure 15.20. The set of seven B-splines with k = 3 and uniform knot spacing 
[1, 2, 3, 4, 5, 6, 7, 8, 10], 

and 4. In Section 15.6.3 we will see a general procedure for building these func¬ 
tions. For now, we simply examine these functions: 

{ 7}U 2 iii<t<i+l 

-u 2 + u+ \ ifi + l<i<z + 2 

^(1 — u ) 2 ifi + 2<i<* + 3 

0 otherwise. 

In order to make the expressions simpler, we wrote the function for each part as 
if it applied over the range 0 to 1 . 

If we evaluate the overall function made from summing together the B-splines, 
at any time only k (3 in this case) of them are non-zero. One of them will be in 
the first part of Equation 15.17, one will be in the second part, and one will be 
in the third part. Therefore, we can think of any piece of the overall function as 
being made up of a degree d = k — 1 polynomial that depends on k coefficients. 
For the k = 3 case, we can write 

f O) = - M) 2 Pi + (-W 2 + u + i)p i+ l + \u 2 Pi+2 

where u = t — i. This defines the piece of the overall function when i < t < i + 1 . 

If we have a set of n points, we can use the B-splines to create a curve. If we 
have seven points, we will need a set of seven B-splines. A set of seven B-splines 


u = t — i, 
u = t — (i + 1), 


u 


— t — (i + 2), 


(15.17) 




Figure 15.21. Curve made from seven quadratic (k= 3) B-splines, using seven control 
points. 
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for k = 3 is shown in Figure 15.20. Notice that there are n + k (10) knots, that 
the sum of the B-splines is 1 over the range k to n + 1 (knots 3 through 8). A 
curve specified using these B-splines and a set of points is shown in Figure 15.21. 


Uniform Cubic B-splines 

Because cubic polynomials are so popular in computer graphics, the special case 
of B-splines with k = 4 is sufficiently important that we consider it before dis¬ 
cussing the general case. A B-spline of third degree is defined by 4 cubic poly¬ 
nomial pieces. The general process by which these pieces are determined is de¬ 
scribed later, but the result is 


ilt 3 

6 “ 

if*<f<* + l 

u = t — i, 

g(—3 u a + 3 u 2 + 3u + 1) 

ifz + l<f<* + 2 

u = t — (i + 1), 

i(3it 3 -6u 2 +4) 

ifz + 2<f<* + 3 

u = t — (i + 2), 

4(— u 3 + 3 u 2 — 3m + 1) 

ifi + 3<f<* + 4 

u = t — (i + 3), 

0 

otherwise. 

(15.18) 


This degree 3 B-spline is graphed for i — 1 in Figure 15.22. 

We can write the function for the overall curve between knots i + 3 and i + 4 
as a function of the parameter u between 0 and 1 and the four control points that 
influence it: 

f(it) = -!:(-u 3 + 3 u 2 - 3u + l)pi + ^(3u 3 - 6m 2 + 4)p i+ i 
6 6 

+ i(-3 u 3 + 3 u 2 + 3u + l)p »+2 + ^w 3 Pi+ 3 - 
6 6 



Figure 15.22. The cubic (k = 4) B-spline with uniform knots. 
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This can be re-written using the matrix notation of the previous sections, giv¬ 
ing a basis matrix for cubic B-splines of 

-1 3-3 1 ' 

3-630 
- 3030 ' 

14 10 

Unlike the matrices that were derived from constraints in Section 15.5, this ma¬ 
trix is created from the polynomials that are determined by the general B-spline 
procedure defined in the next section. 



15.6.3 Non-uniform B-splines 

One nice feature of B-splines is that they can be defined for any k > 1. So if we 
need a smoother curve, we can simply increase the value of k. This is illustrated 
in Figure 15.1. 

So far, we have said that B-splines generalize to any k > 1 and any n > d. 
There is one last generalization to introduce before we show how to actually com¬ 
pute these B-splines. B-splines are defined for any non-decreasing knot vector. 

For a given n and the set of B-splines (and the function created by their 
linear combination) has n + k knots. We can write the value of these knots as 
a vector, that we will denote as t. For the uniform B-splines, the knot vector is 
[1,2, 3,..., n + k\. However, B-splines can be generated for any knot vector of 
length n + k, providing the values are non-decreasing (e.g., ti+ 1 > fj). 

There are two main reasons why non-uniform knot spacing is useful: it gives 
us control over what parameter range of the overall function each coefficient af- 


NAA/W 

k=4 

X X X X X X 

X X X X X 

k=3 

X X X X X X 

n/WW 

X X X X X 

k=5 

X X X X X X 

X X X X X 


Figure 15.1. B-spline curves using the same uniform set of knots and the same control 
points, for various values of k. Note that as k increases, the valid parameter range for the 
curve shrinks. 
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fects, and it allows us to repeat knots (e.g., create knots with no spacing in be¬ 
tween) in order to create functions with different properties around these points. 
The latter will be considered later in this section. 

The ability to specify knot values for B-splines is similar to being able to spec¬ 
ify the interpolation sites for interpolating spline curves. It allows us to associate 
curve features with parameter values. By specifying a non-uniform knot vector, 
we specify what parameter range each coefficient of a B-spline curve affects. Re¬ 
member that B-spline i is non-zero only between knot i and knot i + k. Therefore, 
the coefficient associated with it only affects the curve between these parameter 
values. 

One place where control over knot values is particularly useful is in inserting 
or deleting knots near the beginning of a sequence. To illustrate this, consider a 
curve defined using linear B-splines (k = 2) as discussed in Section 15.6.2. For 
n = 4, the uniform knot vector is [1,2, 3,4, 5,6]. This curve is controlled by a 
set of four points and spans the parameter range t = 2 to t = 5. The “end” of 
the curve (t = 5) interpolates the last control point. If we insert a new point in 
the middle of the point set, we would need a longer knot vector. The locality 
properties of the B-splines prevent this insertion from affecting the values of the 
curve at the ends. The longer curve would still interpolate its last control point 
at its end. However, if we chose to keep the uniform knot spacing, the new knot 
vector would be [1, 2,3,4,5, 6, 7]. The end of the curve would be at t = 6, and the 
parameter value at which the last control point is interpolated will be a different 
parameter value than before the insertion. With non-uniform knot spacing, we can 
use the knot vector [1, 2,3, 3.5,4,5, 6] so that the ends of the curve are unaffected 
by the change. The abilities to have non-uniform knot spacing makes the locality 
property of B-splines an algebraic property, as well as a geometric one. 

We now introduce the general method for defining B-splines. Given values 
for the number of coefficients n 1 the B-spline parameter /;:, and the knot vector t 
(which has length n + k), the following recursive equations define the B-splines: 



This equation is know as the Cox-de Boor recurrence. It may be used to compute 
specific values for specific B-splines. However, it is more often applied alge¬ 
braically to derive equations such as Equation 15.17 or 15.18. 

As an example, consider how we would have derived Equation 15.17. Using 
a uniform knot vector [1,2, 3,...], U = i, and the value k = 3 in Equation 15.20 
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yields 


bz,3 (t) 


t — i (i + 3) -t 

/• . 0 n- ~°i, 2 + / ■ . q \-7——T 0 i+ 1 2 

(l + 2) — i (* + 3) — (* + l) 

— (t — i)bi2 + — {i + 3 — t)bi+ 1,2- 


(15.21) 


Continuing the recurrence, we must evaluate the recursive expressions: 




t — i u i (i + 2) — t u 
(i + 2 — 1) — i’ 1 + (* + 2)-(* + l) l+M 

(i — + (* + 2 — i 


&i+l,2(i) 


_ £ ~ (i + 1) _ 7 

((i + 1) + 2 - 1) - (i + 1) i+1>1 

,_ ((7 + 1) + 2) ~ t 

' ((i + 1) + 2) — ((i + 1) + 1) (i+1)+M 

(t — i + l)6i+i,i + (i + 3 — t)bi+ 2 , 1 . 


Inserting these results into Equation 15.22 gives: 

bi,3(t) = ^(t - i)((t - i)bi t i + (i + 2 - <)& l+ i,i) 

+ ^ — f)(f — * + l)^i+i,i + (* + 3 — t)bi + 2,i- 

To see that this expression is equivalent to Equation 15.17, we note that each 
of the (k = 1) B-splines is like a switch, turning on only for a particular parameter 
range. For instance, is only non-zero between i and i + 1. So, ifi<f<i+l, 
only the first of the (k = 1) B-splines in the expression is non-zero, so 

bi,3(t) = i(f - i) 2 if i < t < i + 1. 

Similar manipulations give the other parts of Equation 15.17. 


Repeated Knots and B-spline Interpolation 

While B-splines have many nice properties, functions defined using them gener¬ 
ally do not interpolate the coefficients. This can be inconvenient if we are using 
them to define a curve that we want to interpolate a specific point. We give a 
brief overview of how to interpolate a specific point using B-splines here. A more 
complete discussion can be found in the books listed in the chapter notes. 
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Figure 15.23. A curve parameterized by quadratic B-splines (k = 3) with seven control 
points. On the left, uniform knots vector [1,2,3,4,5,6,7,8,9,10] is used. On the right, the non- 
uniform knot spacing [1,2,3,4,4,6,7,8,8,10] is used. The duplication of the 4th and 8th knot 
means that all interior knots of the 3rd and 7th B-spline are equal, so the curve interpolates 
the control point associated with those points. 


One way to cause B-splines to interpolate their coefficients is to repeat knots. 
If all of the interior knots for a particular B-spline have the same value, then the 
overall function will interpolate this B-spline’s coefficient. An example of this is 
shown in Figure 15.23. 

Interpolation by repeated knots comes at a high cost: it removes the smooth¬ 
ness of the B-spline and the resulting overall function and represented curve. 
However, at the beginning and end of the spline, where continuity is not an is¬ 
sue, knot repetition is useful for creating endpoint interpolating B-splines. While 
the first (or last) knot’s value is not important for interpolation, for simplicity, we 
make the first (or last) k knots have the same value to achieve interpolation. 

Endpoint interpolating quadratic B-splines are shown in Figure 15.24. The 
first two and last two B-splines are different than the uniform ones. Their expres¬ 
sions can be derived through the use of the Cox-de Boor recurrence: 


&1,3,[0,0,0,1,2,...] {t) 


(1 — t) 2 if 0 < t < 1, 
0 otherwise. 



Figure 15.24. Endpoint-interpolating quadratic (k= 3) B-splines, for n = 8. The knot vector 
is [0,0,0,1,2,3,4,5,6,6,6], The first and last two B-splines are aperiodic, while the middle four 
(shown as dotted lines) are periodic and identical to the ones in Figure 15.20. 
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& 2 , 3 , [ 0 , 0 , 0 , 1 , 2 , — < 


2 u — | u 2 
1(1 - u ) 2 


if 0 < t < 1 u = t, 
if 1 < t < 2 u = t — 1 , 
otherwise. 


15.6.4 NURBS 


Despite all of the generality B-splines provide, there are some functions that can¬ 
not be exactly represented using them. In particular, B-splines cannot represent 
conic sections. To represent such curves, a ratio of two polynomials is used. Non- 
uniform B-splines are used to represent both the numerator and the denominator. 
The most general form of these are non-uniform rational B-splines, or NURBS 
for short. 

NURBS associate a scalar weight hj with every control point p; and use the 
same B-splines for both: 


f(w) 


1 hiPibi,k,t 

£” = 1 


where b^k ,t are the B-splines with parameter k and knot vector t. 

NURBS are very widely used to represent curves and surfaces in geometric 
modeling because of the amazing versatility they provide, in addition to the useful 
properties of B-splines. 


15.7 Summary 

In this chapter, we have discussed a number of representations for free-form 
curves. The most important ones for computer graphics are: 

• Cardinal splines use a set of cubic pieces to interpolate control points. They 
are generally preferred to interpolating polynomials because they are local 
and easier to evaluate. 

• Bezier curves approximate their control points and have many useful prop¬ 
erties and associated algorithms. For this reason, they are popular in graph¬ 
ics applications. 

• B-spline curves represent the curve as a linear combination of B-spline 
functions. They are general and have many useful properties such as being 
bounded by their convex hull and being variation diminishing. B-splines 
are often used when smooth curves are desired. 
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Notes 

The problem of representing shapes mathematically is an entire field unto itself, 
generally known as Geometric Modeling. Representing curves is just the be¬ 
ginning and is generally a precursor to modeling surfaces and solids. A more 
thorough discussion of curves can be found in most geometric modeling texts, 
see for example Geometric Modeling (Mortenson, 1985) for a text that is accessi¬ 
ble to computer graphics students. Many geometric modeling books specifically 
focus on smooth curves and surfaces. Texts such as An Introduction to Splines 
for Use in Computer Graphics (Bartels et al., 1987), Curves and Surfaces for 
CAGD: A Practical Guide (Farin, 2002) and Geometric Modeling with Splines: 
An Introduction (E. Cohen et al., 2001) provide considerable detail about curve 
and surface representations. Other books focus on the mathematics of splines; A 
Practical Guide to Splines (De Boor, 2001) is a standard reference. 

The history of the development of curve and surface representations is com¬ 
plex, see the chapter by Farin in Handbook of Computer Aided Geometric Design 
(Farin et al., 2002) or the book on the subject An Introduction to NURBS: With 
Historical Perspective (D. F. Rogers, 2000) for a discussion. Many ideas were 
independently developed by multiple groups who approached the problems from 
different disciplines. Because of this, it can be difficult to attribute ideas to a sin¬ 
gle person or to point at the “original” sources. It has also led to a diversity of 
notation, terminology, and ways of introducing the concepts in the literature. 


15.7.1 Exercises 

For Exercises 1 —4, find the constraint matrix, the basis matrix, and the basis func¬ 
tions . To invert the matrices you can use a program such as MATFAB or OCTAVE 
(a free MATFAB-like system). 

1. A line segment: parameterized with po located 25% of the way along the 
segment (u = 0.25), and pi located 75% of the way along the segment. 

2. A quadratic: parameterized with po as the position of the beginning point 
(u = 0 ), pi, the first derivative at the beginning point, and P 2 , the second 
derivative at the beginning point. 

3. A cubic: its control points are equally spaced (po has u = 0, pi has u = 
1/3, P 2 has u = 2/3, and P 3 has u = 1). 

4. A quintic: (a degree five polynomial, so the matrices will be 6 x 6 ) where po 
is the beginning position, pi is the beginning derivative, P 2 is the middle 
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(u = 0.5) position, P 3 is the first derivative at the middle, P 4 is the position 
at the end, and ps is the first derivative at the end. 

5. The Lagrange Form (Equation (15.12)) can be used to represent the inter¬ 
polating cubic of Exercise 3. Use it at several different parameter values to 
confirm that it does produce the same results as the basis functions derived 
in Exercise 3. 

6 . Devise an arc-length parameterization for the curve represented by the para¬ 
metric function 

f{u) = ( u,u 2 ). 

7. Given the four control points of a segment of a Hermite spline, compute the 
control points of an equivalent Bezier segment. 

8 . Use the de Castijeau algorithm to evaluate the position of the cubic Bezier 
curve with its control points at ( 0 , 0 ), ( 0 , 1 ), ( 1 , 1 ) and ( 1 , 0 ) for parameter 
values u = 0.5 and u = 0.75. Drawing a sketch will help you do this. 

9. Use the Cox / de Boor recurrence to derive Equation (15.16). 



Implicit Modeling 


Implicit modeling (also known as implicit surfaces) in computer graphics covers 
many different methods for defining models. These include skeletal implicit mod¬ 
eling, offset surfaces, level sets, variational surfaces, and algebraic surfaces. In 
this chapter we briefly touch on these methods and describe how to build skeletal 
implicit models in more detail. Curves can be defined by implicit equations of the 
form 

f{x,y) = 0. 

If we consider a closed curve, such as a circle, with radius r, then the implicit 
equation can be written as 

f(x,y) =x 2 + y 2 ~r 2 = 0. (16.1) 

The value of f(x, y) can be positive (outside the circle), negative (inside the 
circle), or zero for points precisely on the circle. The equivalent in three dimen¬ 
sions is a closed surface around a set of points that occupy a given volume or 
region of space. The volume forms a scalar field, i.e., we can compute a value for 
every point and as can be seen for the circle, the negative values are bounded by 
the implicit curve or surface. The surface can be visualized as a contour in the 
field, connecting points with a particular value such as zero (see Equation (16.1)). 
To compute such a surface implies searching through space to find the points that 
satisfy the implicit equation; this method is unlikely to lead to an efficient al¬ 
gorithm for circle drawing (and even less likely in three dimensions). This was 
perhaps the reason that algorithmic methods for modeling with parametric curves 
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and surfaces were investigated before implicit methods; however, there are some 
good reasons to develop algorithms to visualize implicit surfaces. Chapter 28 
mentions scalar fields in the context of volume visualization. In this chapter we 
explore the implications of deriving the data from a modeling process rather than 
from a scanner. 

Despite the computational overhead of finding the implicit surface, designing 
with implicit modeling techniques offers some advantages over other modeling 
methods. Many geometric operations are simplified using implicit methods in¬ 
cluding: 

• the definition of blends; 


• the standard set operations (union, intersection, difference, etc.) of con¬ 
structive solid geometry (CSG); 

• functional composition with other implicit functions (e.g., R-functions, 
Barthe blends, Ricci blends, and warping); 


f 


I I 


Figure 16.1. Blinn's 
Blobby Man 1980. Image 
courtesy Jim Blinn. 


• inside/outside tests, (e.g., for collision detection). 

Visualizing the surfaces can be done either by direct ray tracing using an algorithm 
as described in (Kalra & Barr, 1989; Mitchell, 1990; Hart & Baker, 1996; deGroot 
& Wyvill, 2005) or by first converting to polygons (Wyvill et al., 1986). 

One of the first methods was proposed by Ricci as far back as 1973 (Ricci, 
1973), who also introduced CSG in the same paper. Jim Blinn’s algorithm for 
finding contours in electron density fields, known as Blobby molecules (J. Blinn, 
1982), Nishimura’s Metaballs (Nishimura et al., 1985) and Wyvills’ Soft Ob¬ 
jects (Wyvill et al., 1986) were all early examples of implicit modeling meth¬ 
ods. Jim Blinn’s Blobby Man (see Figure 16.1) was the first rendering of a non- 
algebraic implicit model. 


16.1 Implicit Functions, Skeletal Primitives 
and Summation Blending 

In the context of modeling an implicit function is defined as a function / applied 
to a point p € E 3 yielding a scalar value G R. 

The implicit function fi(x,y,z) may be split into a distance function 
di(x,y,z) and a fall-off filter function 1 gfir), where r stands for the distance 
from the skeleton and the subscript refers to the ith skeletal element. 

'These functions have been given many names by researchers in the past, e.g filter, potential, 
radial-basis, kernel, but we use fall-off filter as a simple term to describe their appearance. 
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Figure 16.2. Fall-off filter functions (0 < r < 1). (a) Blinn's Gaussian or “blobby” function; 
(b) Nishimura’s “metaball” function; (c) Wyvill et al.'s “soft objects” function; (d) the Wyvill 
function. 


We will use the following notation: 

fi(x,y,z)= giodi(x,y,z) (16.2) 

A simple example is a point primitive, and we take the analogy of a star ra¬ 
diating heat into space. The field value (temperature in this example) may be 
measured at any point p and can be found by taking the distance from p to the 
center of the star and supplying the value to a fall-off filter function similar to 
one of those given in Figure 16.2. In these sample functions, the field is given a 
value of 1 at the center of the star; the value falls off with distance. The surface 
of a model may be derived from the implicit function f(x, y, z) as the points of 
space whose values are equal to some desired iso-value (iso); in the star example, 
a spherical shell for values of iso € (0,1). 

In general, filter functions ( gi ) are chosen so that the field values are max¬ 
imized on the skeleton and fall off to zero at some chosen distance from the 
skeleton. In the simple case where the resulting surfaces are blended together, 
the global field f(x. y , z) of an object, the implicit function, may be defined as 

i—n 

f(x,y,z) = ^ ~2fi(x,y,z ), 

2=1 


(16.3) 
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Figure 16.3. Each column shows two point primitives approaching each other. From left 
to right: the fall-off filter functions used are Blobby, Metaball, soft objects, and Wyvill. Image 
courtesy Erwin DeGroot. 


where n skeletal elements contribute to the resulting held value. An example is 
shown in (Figure 16.3) in which the held at any point (x, y, z ) is calculated as in 
Equation (16.3). 

In this case, two point primitives are placed in close proximity. As the two 
points are brought together, the surfaces bulge and then blend together. The term 
filter function is used because the function causes the primitives to be blurred 
together somewhat akin to a hlter function for images. The summation blend is 
the most compact and efficient blending operation that can be applied to implicit 
surfaces (see Equation (16.3)). 

One advantage of using hlter functions with hnite support is that primitives 
that are far from p will have zero contribution and thus need not be consid¬ 
ered (Wyvill et al., 1986). 

16.1.1 C 1 Continuity and the Gradient 

The most basic form of continuity is C° continuity, which ensures that there are no 
“jumps” in a function. Higher-order continuity is defined in terms of derivatives 
of functions (see Chapter 15). 
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In the case of a 3D scalar field /, the first derivative is a vector function known 
as the gradient , written V/ and defined as 



If V/ is defined at all points, and the three one-dimensional partial derivatives 
are each C°, then / is C 1 . Informally, C 1 surface continuity means that the 
surface normal varies smoothly over the surface. The surface normal is the unit 
vector perpendicular to the surface. If no unique surface normal can be defined 
on the edge of a cube, for example, then the surface is not C 1 . For points on an 
implicit surface the surface normal can be computed by normalizing the gradient 
vector V/. In the example of the circle, points inside have a negative value and 
those on the outside have a positive one. For many types of implicit surfaces, the 
sense of inside and outside is inverted, and since the normal vector must always 
point outwards, it can be opposite to the gradient direction. 

Skeletal implicit primitives are created by applying a fall-off filter function to 
an unsigned distance held as in Equation (16.2). Although the distance held is 
never C 1 at the skeleton, these discontinuities can be removed by using a suitable 
fall-off function (Akleman & Chen, 1999). If an operator, g, combines implicit 
functions, /i and fi, where all points are C 1 , then g(f\, ff) is not necessarily 
C 1 . For example it is possible to make a sharp CSG junction using the min and 
max operators. The combination is not C 1 continuous because the min and max 
operators don’t have that property (see Section 16.5). 

The analysis of operators is complicated by the fact that it is sometimes de¬ 
sirable to create a C 1 discontinuity. This case occurs whenever a crease in the 
surface is desired. For example, a cube is not C 1 because tangent discontinu¬ 
ities occur at each edge. To create creases using C 1 primitives, the operator must 
introduce C 1 discontinuities, and hence cannot be C 1 itself. 

16.1.2 Distance Fields, R-Functions, and F-Reps 

The distance field is dehned with respect to some geometric object T: 


F(T,p) =min|q-p|. 


Visually, F(T, p) is the shortest distance from p to T. Hence, when p lies on T, 
F(T, p) = 0 and the surface created by the implicit function is the object T. Out¬ 
side of T, a non-zero distance is returned. The function T can be any geometric 
entity embedded in 3D—a point, curve, surface, or solid. Procedural modeling 
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with distance fields started with Ricci (Ricci, 1973); R-functions (Rvachev, 1963) 
were first applied to shape modeling more than 20 years later (see (Shapiro, 1994) 
and (A. Pasko et al., 1995)). 

An R-function or Rvachev function is a function whose sign can change if 
and only if the sign of one of its arguments changes; that is, its sign is determined 
solely by its arguments. R-functions provide a robust theoretical framework for 
boolean composition of real functions, permitting the construction of C n CSG 
operators (Shapiro, 1988). These CSG operators can be used to create blending 
operators simply by adding a fixed offset to the result (A. Pasko et al., 1995). 
Although these blending functions are no longer technically R-functions, they 
have most of the desirable properties and can be mixed freely with R-functions 
to create complex hierarchical models (Shapiro, 1988). These R-function-based 
blending and CSG operators are referred to as R-operators (see Section 16.4). 
The Hyperfun system (Adzhiev et al., 1999) is based on F-reps (function repre¬ 
sentation), another name for an implicit surface. The system uses a procedural 
C-like language to describe many types of implicit surfaces. 


16.1.3 Level Sets 

It is useful to represent an implicit field discretely via a regular grid (Barthe et al., 
2002) or an adaptive grid (Frisken et al., 2000). This is exactly what the polygo- 
nization algorithm does in the case of level sets', moreover, the grid can be used 
for various other purposes beside building polygons. Discrete representations of 
/ are commonly obtained by sampling a continuous function at regular intervals. 
For example, the sampled function may be defined by other volume model repre¬ 
sentations (V. V. Savchenko et al., 1998). The data may also be a physical object 
sampled using three-dimensional imaging techniques. Discrete volume data has 
most often been used in conjunction with the level sets method (Osher & Sethian, 
1988), which defines a means for dynamically modifying the data structure using 
curvature-dependent speed functions. Interactive modeling environments based 
on level sets have been defined (Museth et al., 2002), although level sets are only 
one method employing a discrete representation of the implicit field. Methods 
for interactively defining discrete representations using standard implicit surfaces 
techniques have also been explored (Baerentzen & Christensen, 2002). 

A key advantage to employing a discrete data structure is its ability to act as a 
unifying approach for all of the various volume models defined by potential fields 
(discrete or not) (V. V. Savchenko et al., 1998). The conversion of any continuous 
function to a discrete representation introduces the problem of how to reconstruct 
a continuous function, needed for the combined purposes of additional modeling 
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operations and visualization of the resulting potential field. A well known solution 
to this problem is to apply a filter g using the convolution operator (see Chapter 9). 
The choice of a filter is guided by the desired properties of the reconstruction, and 
many filters have been explored (Marschner & Lobb, 1994). The salient point is 
that there is typically a trade-off between the efficiency of the chosen filter and 
the smoothness of the resulting reconstruction; see also Section 16.9. 

To be interactive, a discrete system must restrict the size of the grid relative 
to the available computing power. This, in turn, limits the ability of the mod¬ 
eler to include high-frequency details. Additionally, the smoothing triquadratic 
filter makes it impossible to include sharp edges should they be desired. A par¬ 
tial solution to this problem is the use of adaptive grids, although with any dis¬ 
crete representation there will be limitations. A discrete grid is used in (Schmidt, 
Wyvill, & Galin, 2005) to act as a cache representing a BlobTree node. The grid in 
this work is used for fast prototyping and uses trilinear interpolation for position 
and the slower, more accurate triquadratic interpolation to calculate gradient val¬ 
ues, because the eye is more discerning in observing gradient errors than position 
errors. 


16.1.4 Variational Implicit Surfaces 

It is often required to convert sampled data to an implicit representation. Varia¬ 
tional implicit surfaces interpolate or approximate a set of points using a weighted 
sum of globally-supported basis functions (V. Savchenko et al., 1995; Turk & 
O’Brien, 1999; J. C. Carr et al., 2001; Turk & O’Brien, 2002). These radially 
symmetric basis functions are applied at each sample point. The continuity of 
such a surface depends on the choice of basis function. The C 2 thin-plate spline 
is most commonly used (Turk & O’Brien, 2002; J. C. Carr et al., 2001). Like 
Blinn’s exponential function (see Figure 16.2), this function is unbounded as is 
the resulting variational implicit surface. 

If the field is is globally C 2 , creases cannot be defined; 2 however, anisotropic 
basis functions can be used to produce fields which change more rapidly and may 
appear to have creases (Dinh et al., 2001). At the appropriate scale, the surface 
is still smooth. The smooth field implies that self-intersections do not occur, and 
hence volumes are always well-defined. The thin-plate spline guarantees that 
global curvature is minimized (Duchon, 1977). Variational interpolation has many 
properties which are desirable for 3D modeling, however controlling the resulting 
surfaces can be difficult. 


2 Except see Section 15.2. 
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Figure 16.4. Two blended 
cylinders. Left: summa¬ 
tion blend; right: convolu¬ 
tion surface with barely dis¬ 
cernible bulge (Bloomen- 
thal, 1997). Image courtesy 
Erwin DeGroot. 


Variational implicit surfaces can also be based on compactly-supported radial 
basis functions (CS-RBFs) to reduce the computational cost of variational inter¬ 
polation techniques (Morse et al., 2001). Each CS-RBF only influences a local 
region, so computing /(p) requires only evaluation of basis functions within some 
small neighborhood of p. As with the globally-supported counterpart, the result¬ 
ing field is C k , creases are not supported, and self-intersections cannot occur.' 1 
The local support of each basis function results in a bounded global field. This 
also guarantees that additional iso-contours will be present, as noted by various 
researchers (Ohtake et al., 2003; Reuter, 2003). 

16.1.5 Convolution Surfaces 

Convolution surfaces, introduced by Bloomenthal and Shoemake (Bloomenthal 
& Shoemake, 1991) are produced by convolving a geometric skeleton S with a 
kernel function h. Hence, the value at any position in space is defined by an 
integral over the skeleton: 



Any finitely-supported function can be used as h; see (Sherstyuk, 1999) for a 
detailed analysis of different kernels. 

Like skeletal primitives, convolution surfaces have bounded fields. Blinn’s 
“blobby molecules” is the simplest form of a convolution surface (J. Blinn, 1982); 
in this case, the skeleton consists of points only. This idea was extended by 
Bloomenthal to include line, arc, triangle, and polygon skeletons (Bloomenthal 
& Shoemake, 1991). These represent ID and 2D primitives; 3 D primitives were 
later described by Bloomenthal (Bloomenthal, 1995). 

Combination of convolution surfaces is defined by composition of the under¬ 
lying geometric skeletons and has the advantage of eliminating the bulges that 
tend to occur when composing multiple skeletal primitives with additive blend¬ 
ing. The surface resulting from convolution of the combined skeleton does not 
have bulges, as in Figure 16.4, and the field is continuous even if the combined 
skeleton is non-convex. Convolution surfaces are offset a fixed distance from 
convex portions of a skeleton, but produce a fillet along concave portions of a 
skeleton. 

An example of skeletal elements convolved to build a complex model is shown 
in Figure 16.5. The hand model contains fourteen primitives. 

3 Note, k > 0 depending on the RBF (see Section 15.2). 



Figure 16.4. Two blended 
cylinders. Left: summa¬ 
tion blend; right: convolu¬ 
tion surface with barely dis¬ 
cernible bulge (Bloomen¬ 
thal, 1997). Image courtesy 
Erwin DeGroot. 
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Figure 16.5. Skeletal elements convolved to build a hand model. Image courtesy Jules 
Bloomenthal. 


16.1.6 Defining Skeletal Primitives 


As we will see in the following sections rendering the implicit models requires 
finding the field value and gradient for a large number of points. We need the 
distance to supply to Equation (16.2) and the gradient is useful for root finding as 
well as lighting calculations. Supplying the distance to the fall-off filter functions 
of Figure 16.2 is a matter of calculating the nearest distance to the skeletal primi¬ 
tive, simple for point primitives but a little trickier for more complex geometrical 
shapes. A line segment primitive ( AB ) can be defined as a cylinder around a line 
with hemispherical end caps (see Figure 16.6). Point Pq lies on the surface where 
/(Po) = iso and /(Pl) = 0 since it lies outside of the influence of the line primi¬ 
tive. The distance from some Pi to the line is found by simply projecting onto the 
line AB and calculating the perpendicular distance, e.g., |CPo|; this can be found 
from AC, since A , Pq, and B, are all known: 


AC = AB 


AP 0 ■ AB 
\\AB \\ 2 



Figure 16.6. Line primi¬ 
tive ab and example points 
Po, pi, P2 showing distance 
calculation. 


In Figure 16.6 the field value of P 2 > 0, since P 2 is in the hemispherical end- 
cap, which can be checked separately. Variations of this idea can define primitives 
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Figure 16.7. Cylin¬ 

der primitive blended with a 
sphere. Image courtesy Er¬ 
win DeGroot. 


a 



tc 


Figure 16.8. Implicit 
models from various skele¬ 
tal primitives. Image cour¬ 
tesy Erwin DeGroot. 



Figure 16.9. A ray-traced 
dinosaur model showing 
the underlying skeletal 
primitives. Image courtesy 
Erwin DeGroot. 


with endcaps of different radii producing interesting cone shapes. An example is 
shown in Figure 16.7. 

A great variety of geometrical skeletons have been described, and, in princi¬ 
ple, it is simply a matter of defining the distance to the skeleton from some point p 
and also the gradient at p. For example, an offset surface of a triangle can be de¬ 
fined from the vertices of the triangle and a radius r. A simple way to implement 
this is to use line segment primitives to describe bounding cylinders connecting 
the vertices (radius r). The distance from a point q within the triangle that does 
not fall within the bounding fields of one of the line segment primitives is returned 
as the perpendicular distance to the plane of the triangle. Other examples include 
an implicit disk, defined by a circle and a thickness parameter, a torus also defined 
by a circle and the radius of the cross section (or inner and outer circle radii), a 
circular cone from a disk and a height, a cube with rounded corners, etc. (see 
Figure 16.8). 


16.2 Rendering 

Modeling methods, such as parametric surfaces, lend themselves to visualization, 
since it is easy to iterate over points on the surface that can be found directly from 
the defining equations; for example (x, y) = (cos 9 , sin 6 ), 9 £ [0, 2n) produces 
a circle. 

There are two techniques that are commonly used to render implicit surfaces: 
ray tracing and surface tiling. In practice, a designer wants to visualize an implicit 
surface model quickly, sacrificing quality for speed for interaction purposes. Pro¬ 
totyping algorithms have been concerned with producing a polygon mesh that can 
be rendered in real time on modern workstations. Finding the polygonal mesh 
which best approximates the desired surface is referred to as polygonization or 
surface tiling. For animation or for a final visualization, where quality is pre¬ 
ferred over speed, ray tracing implicit surfaces directly without first polygonizing 
produces excellent results. 

As previously mentioned, finding an implicit surface requires searching 
through space to find the points that satisfy, /(p) = 0. There are two main ap¬ 
proaches to executing such a search: space partitioning—partitioning space into 
manageable units such as cubes, and non-space partitioning, e.g., marching trian¬ 
gles (Hartmann, 1998; Akkouche & Galin, 2001) and the shrinkwrap algorithm 
(Overveld & Wyvill, 2004). 

In this chapter we describe the original space partitioning algorithm and leave 
it to the reader to explore the more advanced methods. This algorithm together 
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with post-processing for mesh refinement (see Chapter 12) and caching provide a 
method for interactive viewing of implicit models on modern workstations. 


16.3 Space Partitioning 

16.3.1 Exhaustive Search 

The basic cubic space partitioning algorithm for tiling implicit surfaces was first 
published in (Wyvill et al., 1986) and a similar algorithm oriented towards volume 
visualization, called marching cubes in (W. Lorensen & Cline, 1987). Since then 
there have been many refinements and extensions. 

A first approach to finding the implicit surface might be to subdivide space 
uniformly into a regular lattice of cubic cells and calculate a value for every ver¬ 
tex. Each cell is replaced with a set of polygons that best approximates the part 
of the surface contained within that cell. The problem with this method is that 
many of the cells will be completely outside or completely inside the volume; 
thus, many cells that contain no part of the surface are processed. For large grids 
of data this can be very time consuming and memory intensive. 

To avoid storing the whole grid, a hash table is used to store only the cubes 
that contain a piece of the surface, based on the data structures used in (Wyvill et 
al., 1986). Working software was published in Graphics Gems IV (Bloomenthal, 
1990). The algorithm is based on numerical continuation ; it starts with a seed 
cube that intersects part of the surface and builds neighboring cubes as necessary 
to follow the surface. 

The algorithm has two parts. In the first part, cubic cells are found that contain 
the surface and in the second part, each cube is replaced by triangles. The first 
part of the algorithm is driven by a queue of cubes, each of which contains part of 
the surface; the second part of the algorithm is table-driven. 


16.3.2 Algorithm Description 

A fast overview of the algorithm is as follows: 

• divide space into cubic voxels; 

• search for surface, starting from a skeletal element; 

• add voxel to queue, mark it visited; 

• search neighbors; 

• when done, replace voxel with polygons. 
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First, space is subdivided into a cubic lattice, and the next task is to find a seed 
cube containing part of the surface. A cube vertex v, inside the surface will have 
a field value Vi >= iso and a vertex outside the surface will have a field value 
Vi < iso; thus, an edge with one of each type of vertex will intersect the surface. 
We call this an intersecting edge. The field value at the nearest cube vertex to the 
first primitive can be evaluated by summing the contributions of the primitives 
as per Equation (16.3), although other operators can also be used as will be seen 
later. We will assume that f{v o) > iso, which indicates that Vq lies within the 
solid. The value of iso is chosen by the user; an example is iso = 0.5 when using 
the soft fall-off function, which has some symmetry properties that lead to nice 
blending (see Figure 16.3). The vertices along one axis are evaluated in turn until 
a value Vi < iso is found. The cube containing the intersecting edge is the seed 
cube. 

The neighbors of the seed cube are examined, and those that contain at least 
one intersecting edge are added to the queue ready for processing. To process a 
cube we examine each face. If any of the bounding edges have oppositely signed 



Figure 16.10. A section through the cubic lattice. The + sign indicates a vertex inside the 
surface (f(vj > iso) and - is outside f(vj < iso). 
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vertices, the surface will pass through that face and the face neighbor must be 
processed. When this process has been completed for all the faces, the second 
phase of the algorithm is applied to the cube. If the surface is closed, eventually 
a cube will be re-visited and no more unmarked neighbors found, and the search 
algorithm will terminate. Processing a cube involves marking it as processed 
and processing its unmarked neighbors. Those that contain intersecting edges are 
processed until the entire surface has been covered (see Figure 16.10). 

Each cube is indexed by an identifying vertex which we define to be the lower- 
left far corner (i.e., the vertex with the lowest (x, y. ,3(-coordinate values (see Fig¬ 
ure 16.11)). For each vertex that is inside the surface, the corresponding bit will 
be set to form the address in an 8-bit table (see Figure 16.11 and Section 16.3.3). 

The identifying vertex is addressed by integers i,j,k, computed from the 
( x, y , ^(-coordinate location of the cube such that x = side * i, etc., where side is 
the size of the cube. The identifying vertex of each cube may appear in as many 
as eight other cubes, and it would be inefficient to store these vertices more than 
once. Thus, the vertices are stored uniquely in a chained hash table. Since most 
of the space does not contain any part of the surface, only those cubes that are 
visited will be stored. The implicit function value is found for each vertex as it is 
stored in the hash table. 

Nothing is known about the topology of the surface so a search must be started 
from every primitive to avoid any disconnected parts of the surface being missed. 
A scalar can be used to scale the influence of a primitive. If the scalar can be less 
than zero, then it is possible to search along an axis without finding an intersect¬ 
ing edge. In this case, a more sophisticated search must be done to find a seed 
cube (Galin & Akkouche, 1999). 

Data Structures 

The hash table entry holds five values: 

• the i,j,k lattice indices of the identifying vertex (see Figure 16.11); 

• /, the implicit function value of the identifying vertex; 
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Figure 16.11. Vertex num¬ 
bering. 


• Boolean to indicate whether this cube has been visited. 


The hash function computes an address in the hash table by selecting a few bits out 
of each of i,j, k and combining them arithmetically. For example, the five least 
significant bits produces a 15-bit address for a table, which must have a length 
of 2 15 . Such a hash function can be neatly implemented in the C-preprocessor as 
follows: 
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#define NBITS 5 

#define BMASK 037 

#def ine HASH(a,b,c) ( ( (a&BMASK )«NBITS | b&BMASK ) 

«NBITS I c&BMASK) 

#define HSIZE 1«NBITS*3 

The queue (FIFO list) is used as temporary storage to identify the neighbors 
for processing. The algorithm begins with a seed cube that is marked as visited 
and placed on the queue. The first cube on the queue is dequeued and all its 
unvisited neighbors are added to the queue. Each cube is processed and passed to 
the second phase of the algorithm if it contains part of the surface. The queue is 
then processed until empty. 


16.3.3 Polygonization Algorithm 

The second phase of the algorithm treats each cube independently. The cell is 
replaced by a set of triangles that best matches the shape of the part of the surface 
that passes through the cell. The algorithm must decide how to polygonize the cell 
given the implicit function values at each vertex. These values will be positive or 
negative (i.e., less than or greater than the iso-value), giving 256 combinations 
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Figure 16.12. Table 2 contains the edges intersected by the surface. Table 1 points to the 
appropriate entry in Table 2. 
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of positive or negative vertices for the eight vertices of the cube. A table of 256 
entries provides the right vertices to use in each triangle (Figure 16.12). For ex¬ 
ample, entry 4(00000100) points to a second table that records the vertices that 
bound the intersecting edges. In this example, vertex number 2 is inside the sur¬ 
face (f(V 2) >= iso) and, therefore, we wish to draw a triangle that connects the 
points on the surface that intersect with edges bounded by (V2, F0), (V2, V?>), 
and (1^2, V 6) as shown in Figure 16.13. 


Finding Cube-Surface Intersections 

Figure 16.13 shows a cube where vertex V 2 is inside the surface and all other 
vertices are outside. Intersections with the surface occur on three edges as shown. 
The surface intersects edge V 2 — V% at the point A. The fastest but inaccurate way 
to calculate A is to use linear interpolation: 

f(A)-f(V 2 ) _ \A-V 2 \ 
f(V 6 ) - /(V 2 ) side 

If the cube side is 1 and the iso-value sought for f(A) is 0.5, then 


A = V 3 + 


0-5 - f(V 2 ) 

m)-f(v 2 y 


This works well for a static image but in animation error differences between 
frames will be very noticeable. A root-finding method such as regulafalsi should 
be employed. This becomes more computationally costly as the gradient is needed 
to evaluate the point of intersection. The gradient is also needed at surface points 
for rendering. For many types of primitives it is simpler to find a numerical ap¬ 
proximation using sample points around p, as in 

f f(p + Ax ) ~ /(p) /(p + A y)^/(p) /(p + a -)-/(p)\ 

V/(P) = (- Ai -'- A~y -’--j ' 

A reasonable value for A has been found empirically to be 0.01 * side where side 
is the length of a cube edge. 

For manufacturing a mesh, as opposed to a set of independent triangles, a 
second hash table can maintain a list of all the intersecting edges. Since each cube 
edge is shared by up to four neighbors, the edge hash table prevents repetition of 
the surface-cube edge intersection calculation. The hash address can be derived 
from the same hash function as for vertices (applied to the edge endpoints). 



Figure 16.13. Finding the 
intersection of the surface 
with a cube edge. 
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Figure 16.15. Cube too 
large to capture small vari¬ 
ation in implicit function. 



Figure 16.14. Examples of vertices inside (+) and outside (-) the surface. Note the extra 
sample gives a clue to avoid ambiguous cases. 

16.3.4 Sampling Problems 

Ambiguities occur when opposite comers of a face (or the cube) have the same 
sign and the other pair of vertices on the face have the opposite sign (see Figure 
16.14). A sample taken in the center of the face will give a clue as to whether the 
cube represents the meeting of two surfaces or a saddle. It should be made clear 
that a spatial grid stores a sample of the implicit function at every vertex. If the 
function happens to vary considerably within a cell the polygonal representation 
will not show such variations (see Figure 16.15). The surface cannot be resolved 
by sampling alone unless something is known about the curvature of the surface. 
A good discussion of this topic appears in (Kalra & Barr, 1989). 

This ambiguity problem (not the under-sampling problem) is avoided by sub¬ 
dividing the cubic cell into tetrahedra. The tetrahedra can then be polygonized 
unambiguously. Since there are four vertices in each tetrahedron, a table of six¬ 
teen entries will provide the correct triangle information. The disadvantage is that 
approximately twice the number of polygons will be generated. 

Subdividing a Cube 

Without requiring additional cell vertices, a cube may be decomposed into five or 
six tetrahedra as shown in Figure 16.16. These decompositions introduce diago¬ 
nals on the cube faces, and to maintain a consistent diagonal direction between 
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Figure 16.16. Decomposing a cube into six tetrahedra. Image courtesy Erwin DeGroot. 


neighbors, the six decomposition is preferable. The introduction of diagonal 
edges produces a higher-resolution surface than replacing each cube directly with 
triangles. The decomposition into tetrahedra and the replacement of the tetrahe¬ 
dra with triangles are fast, table-driven algorithms, which produce topologically 
consistent meshes. 


16.3.5 Cell Polygonization 

Two obvious problems emerge from the use of uniform space subdivision. The 
size of triangles output by this algorithm do not adapt to the curvature of the sur¬ 
face and a further sample is required to solve the ambiguities, in which cubic cells 
are replaced by polygons. A space subdivision algorithm based on an octree was 
developed by Bloomenthal (Bloomenthal, 1988), which does adapt to the curva¬ 
ture of the surface. Cells are subdivided into eight octants and cracks are avoided 
by using a restricted octree scheme, i.e., neighboring cells cannot differ by more 
than one level of subdivision. This indeed reduces the number of polygons gen¬ 
erated, but full advantage of large cells can only be taken if the flat regions of 
the surface happen to fall entirely within the appropriate octants. The algorithm 
proves in practice to be considerably slower than the uniform voxel algorithm and 
is more complicated to implement. 


16.4 More on Blending 

Section 16.1 showed that blending can be made to occur when field values are 
summed. Ricci, in his landmark paper (Ricci, 1973), describes super-elliptic 
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blending. Given two functions Fa and F B , previously we simply found the im¬ 
plicit value as f ’ to tai = Fa + F B - We can denote this more general blending 
operator asioB. The Ricci blend is defined as: 

/aoB = (/a” + /b")"• (16-4) 


It is interesting to point out the following properties: 



Figure 16.17. By varying 
n, the Ricci blend may be 
made to change smoothly 
from blend to union. Image 
courtesy Erwin DeGroot. 


lim ( f A n + fB n ) n = max ( f A , /b), 

n —>-+oo 

lim ( f A n + f B n y = min ( f A , f B ). 

n —>—oo 

Moreover, this generalized blending is associative, i.e., /(AoB)oC = /a«( BoC) ■ 
The standard blending operator + proves to be a special case of the super-elliptic 
blend with n = 1. When n varies from 1 to infinity, it creates a set of blends 
interpolating between blending A + B and union AU B (see Figure 16.17). Fig¬ 
ure 16.27 shows the nodes to be binary or unary; in fact the binary nodes can 
easily be extended using the above formulation to n-ary nodes. 

The power of Ricci’s operators is that they are closed under the operations 
on the space of all possible implicit volumes, meaning that an application of an 
operator simply produces another scalar field defining another implicit volume. 
This new field can be composed with other fields, again using Ricci’s operators. 
Equation (16.4) will always produce the exact union of two implicit volumes, 
regardless of how complex they are. Compared with the difficulties involved in 
applying boolean CSG operations to B-rep surfaces, solid modeling with implicit 
volumes is incredibly simple. 

Following Pasko’s functional representation (A. Pasko et al., 1995), another 
generalized blending function may be defined: 


Iaob = (j A + /b + ot\Jf A + /b 2 ^ (/a 2 + /b 2 ) 2 ■ 


When a £ [—1,1] varies from —1 to 1, it creates a set of blends interpolating 
the union and the intersection operators. However, this operator is no longer 
associative which is incompatible with the definition of n-ary operators. 


16.5 Constructive Solid Geometry 

Implicit models are frequently termed implicit surfaces ; however, they are inher¬ 
ently volume models and useful for solid modeling operations. Ricci introduced a 
constructive geometry for defining complex shapes from operations such as union, 
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intersection, difference, and blend upon primitives (Ricci, 1973). The surface was 
considered as the boundary between the half spaces /(p) < 1, defining the in¬ 
side, and /(p) > 1 defining the outside. This initial approach to solid modeling 
evolved into constructive solid geometry or CSG (Ricci, 1973; Requicha, 1980). 
CSG is typically evaluated bottom-up according to a binary tree, with low-degree 
polynomial primitives as the leaf nodes and internal nodes representing Boolean 
set operations. These methods are readily adapted for use in implicit modeling, 
and in the case of skeletal implicit surfaces, the Boolean set operations union 
U m ax, intersection (T m i n and difference \ m inmax are defined as follows (Wyvill et 
al„ 1999): 

Umax /= rnax(/)), (16.5) 

1— 0 
k -1 

Gmin / = min (/;) , 

2 — 0 

\minmax / = min ^/ Q , 2 * iso - max(/j)j . 

The Ricci operators are illustrated in Figure 16.18 for point primitives A 
and B. For union (bottom left) the field at all points inside the union will be the 

greater of f a () and f/>(). For intersection (center), points in the region marked 

as Pi will have value min (f A (P i), /b^Pi)) = 0, since the contribution of B will 
be zero outside of its range of influence. Similarly, for the region marked as P 2 , 


(influence of A is zero, i.e., the minimum) leaving only the intersection region 
with positive values. Difference works similarly using the iso-value in the three 
marked regions (P/) as follows: 

f(Po) = 

min (/ B (P 0 ), 2 * iso - f A (Po )) 

= 

min([iso, 1], [2 * iso — 1, iso]) 

= 

[2 * iso — 1, iso] < iso 

f(Pi) = 

min (/ B (Pi), 2 * iso - /a(Pi)) 

= 

min([0, iso], [2 * iso — 1, iso]) < iso 

f(P2) = 

min (/b(P 2 ), 2 * iso - f A (P 2 )) 

= 

min([iso, 1], [iso, 2 * iso]) >= iso 


CSG operators create creases, i.e., C 1 discontinuities. For example, the min() 
operator (Equation (16.5)) creates C 1 discontinuities at all points where /i(p) = 
/2 (p) ■ When applied to two spheres, the discontinuities produced by this union 
operator result in a crease on the surface, as shown in Figure 16.18, which is 
the desired result. Discontinuities unfortunately extend into the field outside of 


404 


16. Implicit Modeling 



Figure 16.19. Two 

point primitives on the left 
are connected by the Ricci 
union. A third primitive is 
blended to the result, creat¬ 
ing an unwanted crease in 
the field. Image courtesy 
Erwin DeGroot. 



Figure 16.20. Point Q 
returns the field value for 
point P. 



Figure 16.18. Ricci operators for CSG. Image courtesy Erwin DeGroot. 


the surface, which is not visible in this image. If a blend is then applied to the 
result of the union, the C 1 -discontinuous plane in the field produces a shading 
discontinuity (Figure 16.19). 

The problem can be avoided to an extent (G. Pasko et al., 2002), and CSG op¬ 
erators have been developed that are C 1 at all points except those where /i(p) = 
f 2 (p) = iso (Barthe et al., 2003). 


16.6 Warping 

The ability to distort the shape of a surface by warping the space in its neigh¬ 
borhood is a useful modeling tool. A warp is a continuous function w(x,y,z) 
that maps K 3 onto R 3 . Sederberg provides a good analogy for warping when de¬ 
scribing free form deformations (Sederberg & Parry, 1986). He suggests that the 
warped space can be likened to a clear, flexible, plastic parallelepiped in which 
the objects to be warped are embedded. A warped element may be defined by 
simply applying some warp function u>(p) to the implicit equation: 

fi(x,y,z) = giodiOWi(x,y, z). (16.6) 

A warped element may be fully characterized by the distance to its skele¬ 
ton di(x, y, z), its fall-off filter function gi(r), and eventually its warp function 
w, (x. y. z ). To render or perform operations on an implicit surface, the implicit 
value of many points f(P) must be found. First, P is transformed by the warp 
function to some new point Q, and f(Q) is returned in place of /(P). In Fig¬ 
ure 16.20, instead of returning the implicit value of some point f(Q), the value 
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for f(P) is returned. In this case, the iso-value is returned and the implicit surface 
(curve in 2D) passes through Q instead of P. Thus, the circle is warped into an 
ellipse. 

Barr introduced the notion of global and local deformations using the opera¬ 
tions of twist , taper, and bend applied to parametric surfaces (Barr, 1984). The 
deformations can be nested to produce models such as the one shown in Fig¬ 
ure 16.27. Conceptually, these are easy to apply to an implicit surface, as indi¬ 
cated in Equation (16.6). 

Note that the normal cannot be calculated in a similar manner to warping a 
point. This problem is similar to the problem outlined in Section 13.2 on in¬ 
stancing. In this case, the normal can most easily be approximated using Equa¬ 
tion (16.3.3) although the use of the Jacobian, as suggested in (Barr, 1984), yields 
precise results. The Barr warps are described in the following sections. 


16.6.1 Twist 

In this example, the twist is around the z-axis by 9 (see Figure 16.21) for three 
blended implicit cylinders with a twist warp applied to them. 

The twist around z is expressed as 

{ x * cos(8(z)) — y * sin((/(z)) 'j 
x * sin((/(z)) + y * cos($(z)) >. 


16.6.2 Taper 

Taper is applied along one major axis. A linear taper has proved to be the most 
useful although quadratic and cubic tapers are easily implemented. For example 
a linear taper along the y -axis involves changing both x- and z-coordinates. A 
linear scale is applied to y between y max and y m i n : 

s (y) = --- W{x, y, z) = 

2/max 2/min 




Figure 16.21. Three 
blended implicit cylinders 
twisted together. Image 
courtesy Erwin DeGroot. 



Figure 16.22. Three 
blended implicit cylinders, 
twisted then tapered. Im¬ 
age courtesy Erwin DeG¬ 
root. 


16.6.3 Bend 

Bend is also applied along one major axis. For the bend example below, the 
bending rate is k measured in radians per unit length, the axis of the bend is 
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Figure 16.23. Three 
blended implicit cylinders, 
twisted together, tapered 
and bent. Image courtesy 
Erwin DeGroot. 


( xq , 1 /k), and the angle 6 is defined as (x — xg ) * k. The bend around z is 

( - sin(0) *(y- l/k) + x 0 1 
w(x,y,z)=< cos(0) * (y — l/k) + l/k > 


16.7 Precise Contact Modeling 



Figure 16.24. Sea 

anemone deforms to im¬ 
plicit rock. Image courtesy 
Mai Nur and X. Liang. 


Precise contact modeling (PCM) is a method of deforming implicit surface prim¬ 
itives in contact situations while maintaining a precise contact surface with C 1 
continuity (Gascuel, 1993). PCM is important in that it is a simple and automatic 
way of showing how a model can react to its environment. This cannot be so 
easily done with non-implicit methods (see Figure 16.24). 

PCM is implemented by the inclusion of a deforming function s(p) that mod¬ 
ifies the field value returned for each point. For each pair of objects, collision is 
first detected using a bounding-box test. Once it is established that a collision is 
likely, PCM is applied. A local, geometric deformation term Sj is computed and 
added to the implicit function /,;. The volume of the colliding objects is divided 
into an interpenetration region and a deformation region. The result of applying 
Si is that the interpenetration region is compressed so that contact is maintained 
without interpenetration occurring (see Figure 16.25). The effect of Sj is attenu¬ 
ated to zero within the propagation region so that the volume outside of the two 
regions is not deformed. 



Figure 16.25. A 2D slice through objects in collision showing the various regions and PCM 
deformation. Image courtesy Erwin DeGroot. 
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Given two skeletal elements generating fields fi(p) and f 2 ip), the surface 
around each one is calculated as 

flip) + Slip) = 0, 

h(p) + s 2 {p) = 0. 

We need to generate a surface common to both elements (dotted line in Fig¬ 
ure 16.25), i.e., where they share a solution in the interpenetration region for some 
p in that region: 

Slip)-flip) = iso, (16.7) 

S 2 (p)~f 2 ip) = iso. 

Intuitively, the deeper within object 1 that object 2 penetrates, the higher the im¬ 
plicit value of object 1 and thus the more that object 2 will be compressed. 

The function, s, is defined to produce a smooth junction at the boundary of 
the interpenetration region, in other words where s* = 0 but its derivative is 
greater than zero. From here to the boundary of the propagation region, s, is used 
to attenuate the propagation to zero. The nearest point on the interpenetration 
region boundary po is found by following the gradient. 

Within the propagation region s*(p) = hi{r), where p = (x, y , z) is the point 
whose implicit value is being calculated and r = ||p—poll ( see Figure 16.26). The 
value of Vi, set by the user, defines the size of the propagation region; no defor¬ 
mation occurs beyond this region. To control how much the objects inflate in the 
propagation region, the user provides a value for the parameter a. The maximum 
value of hi is Mi. The current minimum of s, is negative in the interpenetra¬ 
tion region and is given as Si m i n , where M, = —aiSi m [ n . Thus an object will 

h i 



Figure 16.26. The function, h,(r) is the value of the deformation function w ,• in the propaga¬ 
tion region. 
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be compressed in the interpenetration region and will inflate in the propagation 
region. The equation for hi is formed in two parts by two cubic polynomials that 
are designed to join at r = rj/2, where the slope is zero: 

4 (wik - 4 Mi) 

C 3 j 

W i 

j _ 4(3 Mi - Wik) 

CL o ’ 

wf 

hi(r ) = cr 3 + dr 2 + kr if r € [0, Wj/2], 

hi(r) = —f( r ~ Wi) 2 (4:r - w^ 3 if r e [wi/2, Wi], 

w i 

It is desirable that we have C' 1 -continuity as we move from the interpenetra¬ 
tion to the propagation region. Thus, h'f 0 ) = k in Figure 16.26, is the directional 
derivative of Sj at the junction (marked as po in Figure 16.25). As indicated in 
Equation (16.7), s, = — f, in the interpenetration region, thus: 

*=||V(/ i ,p 0 )|| 

PCM is only an approximation to a properly deformed surface, but it is an 
attractive algorithm due to its simplicity. 


16.8 The BlobTree 

The BlobTree is a method that employs a tree structure that extended the CSG 
tree to include various blending operations using skeletal primitives (Wyvill et 
al., 1999). A system with similar capabilities, the Hyperfun project, used a spe¬ 
cialized language to describe F-rep objects (Adzhiev et al., 1999). 

In the BlobTree system, models are defined by expressions that combine im¬ 
plicit primitives and the operators U (union), (~l (intersection), — (difference), 
+ (blend), o (super-elliptic blend), and w (warp). The BlobTree is not only the 
data structure built from these expressions but also a way of visualizing the struc¬ 
ture of the models. The operators listed above are binary with the exception of 
warp, which is a unary operator. In general it is more efficient to use n-ary rather 
than binary operators. The BlobTree incorporates affine transformations as nodes 
so that it is also a scene graph and primitives (e.g., skeletons) form the leaf nodes. 
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Figure 16.27. BlobTree. The spiral staircase is built from a central textured cylinder to 
which the stairs and the railing are blended. The railing is comprised of a series of cylinders 
blended with two circle (torus) primitives, blended together and further blended with a vertical 
cylinder. The BlobTree is also a scene graph and instancing nodes repeat the various parts 
transformed by the appropriate matrices. Each stair is made from a tapered polygon primitive 
(that becomes an offset surface); intersection and union nodes combine the inflated disk with 
the stair. 

16.8.1 Traversing the BlobTree 

An example of a BlobTree including the Barr warps and CSG operations is shown 
in Figure 16.27. Other nodes can include 2D texturing (Schmidt et al., 2006), pre¬ 
cise contact modeling, as well as animation and other attributes. The traversal of 
the BlobTree is in essence very simple. All that is required to render the object 
either by polygonizing or ray tracing is to find the implicit value of any point (and 
the corresponding gradient). This can be done by traversing the tree. Polygoniza- 
tion and ray-tracing algorithms need to evaluate the implicit field function at a 
large number of points in space. The function f(J\f , M ) returns the field value for 
the node Af at the point M, which depends on the type of the node. The values C 
and 1Z indicate that the left or right branch of the tree is explored .The algorithm 
below is written (for simplicity) as if the tree were binary: 

function /(A f, M ): 

• primitive: f(M)\ 

• warp: f{C{N),w{M))\ 

• blend: 
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Figure 1 6.28. “Spiral Stairs.” A complex BlobTree implicit model created in Erwin DeGroot’s 
BlobTree.net system. (See also Plate VI.) 



Figure 16.29. Outlines 
are inflated. Image cour¬ 
tesy Erwin DeGroot. 


• union: ma,x(f(jO(Af), M), f(TZ(Af), M))\ 

• intersection: min 

• difference: min(/(£(A/’), M), — M)). 

A complex BlobTree model showing many of the features that have been in¬ 
tegrated is shown in Figure 16.28. 


16.9 Interactive Implicit Modeling Systems 

Early sketch-based modeling systems, such as Teddy (Igarashi et al., 1999), used 
a few drawn strokes from the user to infer a polygonal model in 3-space. With 
better hardware and improved algorithms, sketch-based implicit modeling sys¬ 
tems are now possible. Shapeshop uses implicit sweep surfaces to manufacture 
3D strokes from 2D user strokes and also preserves the hierarchy of the BlobTree 
unlike the early systems that produced homogeneous meshes (Schmidt, Wyvill, 
Sousa, & Jorge, 2005). This enables a user to produce complex models of ar¬ 
bitrary topology from a few simple strokes. The margin figures show a closed 
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drawn stroke (Figure 16.29) inflated into a an implicit sweep and a second sweep 
(Figure 16.30) that has a smaller sweep object subtracted using CSG. 

One of the improvements that made this possible is a caching system that uses 
a fixed 3D grid of implicit values at each node of the BlobTree representing the 
values found by traversing the tree below the node (Schmidt, Wyvill, & Galin, 
2005). If the value of some point p is required at node N , a value may be returned 
without traversing the tree below N, provided that part of the tree is unaltered. 
Instead, an interpolation scheme (see Chapter 28) is used to find a value for p. This 
scheme speeds up traversal for complex BlobTrees and is one factor in enabling a 
system to run at interactive rates. 

The next generation of implicit modeling systems will exploit hardware 
and software advances to be able to handle more and more complex hierarchical 
models interactively. A more complex Shapeshop example is shown in Fig¬ 
ure 16.31. 



Figure 16.30. BlobTree 
operations can be applied, 
e.g., CSG difference. Im¬ 
age courtesy Erwin DeG- 
root. 



Figure 16.31. “The Next Step.” A complex BlobTree implicit model created interactively in 
Ryan Schmidt’s Shapeshop by artist, Corien Clapwijk (Andusan). (See also Plate VII.) 
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Exercises 

1. In an implicit surface modeling system the fall-off filter function is defined 
as 



r > R, 

otherwise. 


where R is a constant. A point primitive placed at (—1,0) and another 
at (1, 0) are rendered to show the / = 0.5 iso-surface. The value R, the 
distance where the potential due to the point falls to zero in both cases, is 


1.5. 


Calculate the potential at the point (0, 0) and at +0.5 intervals until the 
point (2.5,0). Sketch the 0.5 contour and the contour at which the field 
falls to zero. 

2. Why are the ambiguous cases in the polygonization algorithm considered 
to be a sampling problem? 

3. Calculate the error involved in using linear interpolation to estimate the 
intersection of an implicit surface and a cubic voxel. 

4. Design an implicit primitive function using the skeleton of your choice. The 
function must take as input a point and return an implicit value and also the 
gradient at that point. 


Michael Ashikhmin 
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Computer Animation 


Animation is derived from the Latin anima and means the act, process, or result 
of imparting life, interest, spirit, motion, or activity. Motion is a defining property 
of life and much of the true art of animation is about how to tell a story, show 
emotion, or even express subtle details of human character through motion. A 
computer is a secondary tool for achieving these goals—it is a tool which a skillful 
animator can use to help get the result he wants faster and without concentrating 
on technicalities in which he is not interested. Animation without computers, 
which is now often called “traditional” animation, has a long and rich history of 
its own which is continuously being written by hundreds of people still active in 
this art. As in any established field, some time-tested rules have been crystallized 
which give general high-level guidance to how certain things should be done and 
what should be avoided. These principles of traditional animation apply equally 
to computer animation, and we will discuss some of them below. 

The computer, however, is more than just a tool. In addition to making the 
animator’s main task less tedious, computers also add some truly unique abil¬ 
ities that were simply not available or were extremely difficult to obtain be¬ 
fore. Modern modeling tools allow the relatively easy creation of detailed three- 
dimensional models, rendering algorithms can produce an impressive range of 
appearances, from fully photorealistic to highly stylized, powerful numerical sim¬ 
ulation algorithms can help to produce desired physics-based motion for partic¬ 
ularly hard to animate objects, and motion capture systems give the ability to 
record and use real-life motion. These developments led to an exploding use 
of computer animation techniques in motion pictures and commercials, automo- 
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tive design and architecture, medicine and scientific research among many other 
areas. Completely new domains and applications have also appeared including 
fully computer-animated feature films, virtual/augmented reality systems and, of 
course, computer games. 

Other chapters of this book cover many of the developments mentioned above 
(for example, geometric modeling and rendering) more directly. Here, we will 
provide an overview only of techniques and algorithms directly used to create and 
manipulate motion. In particular, we will loosely distinguish and briefly describe 
four main computer animation approaches: 

• Keyframing gives the most direct control to the animator who provides nec¬ 
essary data at some moments in time and the computer fills in the rest. 

• Procedural animation involves specially designed, often empirical, mathe¬ 
matical functions and procedures whose output resembles some particular 
motion. 

• Physics-based techniques solve differential equation of motion. 

• Motion capture uses special equipment or techniques to record real-world 
motion and then transfers this motion into that of computer models. 

We do not touch upon the artistic side of the field at all here. In general, we can 
not possibly do more here than just scratch the surface of the fascinating subject 
of creating motion with a computer. We hope that readers truly interested in the 
subject will continue their journey well beyond the material of this chapter. 


17.1 Principles of Animation 

In his seminal 1987 SIGGRAPH paper (Lasseter, 1987), John Lasseter brought 
key principles developed as early as the 1930’s by traditional animators of Walt 
Disney studios to the attention of the then-fledgling computer animation com¬ 
munity. Twelve principles were mentioned: squash and stretch; timing; artfic- 
ipation; follow through and overlapping action; slow-in and slow-out; staging; 
arcs; secondary action; straight-ahead and pose-to-pose action; exaggeration; 
solid drawing skill; appeal. Almost two decades later, these time-tested rules, 
which can make a difference between a natural and entertaining animation and a 
mechanistic-looking and boring one, are as important as ever. For computer ani¬ 
mation, in addition, it is very important to balance control and flexibility given to 
the animator with the full advantage of the computer's abilities. Although these 
principles are widely known, many factors affect how much attention is being 
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paid to these rules in practice. While a character animator working on a feature 
film might spend many hours trying to follow some of these suggestions (for ex¬ 
ample, tweaking his timing to be just right), many game designers tend to believe 
that their time is better spent elsewhere. 


17.1.1 Timing 

Timing, or the speed of action, is at the heart of any animation. How fast things 
happen affects the meaning of action, emotional state, and even perceived weight 
of objects involved. Depending on its speed, the same action, a turn of a charac¬ 
ter’s head from left to right, can mean anything from a reaction to being hit by a 
heavy object to slowly seeking a book on a bookshelf or stretching a neck mus¬ 
cle. It is very important to set timing appropriate for the specific action at hand. 
Action should occupy enough time to be noticed while avoiding too slow and 
potentially boring motions. For computer animation projects involving recorded 
sound, the sound provides a natural timing anchor to be followed. In fact, in most 
productions, the actor’s voice is recorded first and the complete animation is then 
synchronized to this recording. Since large and heavy objects tend to move slower 
than small and light ones (with less acceleration, to be more precise), timing can 
be used to provide significant information about the weight of an object. 


17.1.2 Action Layout 

At any moment during an animation, it should be clear to the viewer what idea (ac¬ 
tion, mood, expression) is being presented. Good staging, or high-level planning 
of the action, should lead a viewer’s eye to where the important action is currently 
concentrated, effectively telling him “look at this, and now, look at this” without 
using any words. Some familiarity with human perception can help us with this 
difficult task. Since human visual systems react mostly to relative changes rather 
than absolute values of stimuli, a sudden motion in a still environment or lack of 
motion in some part of a busy scene naturally draws attention. The same action 
presented so that the silhouette of the object is changing can often be much more 
noticeable compared with a frontal arrangement (see Figure 17.1(a)). 

On a slightly lower level, each action can be split into three parts: anticipation 
(preparation for the action), the action itself and follow-through (termination of 
the action). In many cases the action itself is the shortest part and, in some sense, 
the least interesting. For example, kicking a football might involve extensive 
preparation on the part of the kicker and long “visual tracking” of the departing 
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Figure 17.1. Action layout. Left: Staging action properly is crucial for bringing attention 
to currently important motion. The act of raising a hand would be prominent on the top but 
harderto notice on the bottom. A change in nose length, on the contrary, might be completely 
invisible in the first case. Note that this might be intentionally hidden, for example, to be 
suddenly revealed later. Neither arrangement is particularly good if both motions should be 
attended to. Middle: The amount of anticipation can tell much about the following action. 
The action which is about to follow (throwing a ball) is very short but it is clear what is about 
to happen. The more wound up the character is, the faster the following action is perceived 
to be. Right: The follow-through phase is especially important for secondary appendages 
(hair) whose motion follows the leading part (head). The motion of the head is very simple, 
but leads to non-trivial follow-through behavior of the hair itself. It is impossible to create a 
natural animation without a follow-through phase and overlapping action in this case. Figure 
courtesy Peter Shirley and Christina Villarruel. 


ball with ample opportunities to show the stress of the moment, emotional state 
of the kicker, and even the reaction to the expected result of the action. The action 
itself (motion of the leg to kick the ball) is rather plain and takes just a fraction of 
a second in this case. 

The goal of anticipation is to prepare the viewer to what is about to happen. 
This becomes especially important if the action itself is very fast, greatly im¬ 
portant, or extremely difficult. Creating a more extensive anticipation for such 
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actions serves to underscore these properties and, in case of fast events, makes 
sure the action will not be missed (see Figure 17.1(b)). 

In real life, the main action often causes one or more other overlapping ac¬ 
tions. Different appendages or loose parts of the object typically drag behind the 
main leading section and keep moving for a while in the follow-through part of 
the main action as shown in Figure 17.1(c). Moreover, the next action often starts 
before the previous one is completely over. A player might start running while 
he is still tracking the ball he just kicked. Ignoring such natural flow is gener¬ 
ally perceived as if there are pauses between actions and can result in robot-like 
mechanical motion. While overlapping is necessary to keep the motion natural, 
secondary action is often added by the animator to make motion more interesting 
and achieve realistic complexity of the animation. It is important not to allow 
secondary action to dominate the main action. 


17.1.3 Animation Techniques 

Several specific techniques can be used to make motion look more natural. The 
most important one is probably squash and stretch which suggests to change the 
shape of a moving object in a particular way as it moves. One would generally 
stretch an object in the direction of motion and squash it when a force is ap¬ 
plied to it, as demonstrated in Figure 17.2 for a classic animation of a bouncing 
ball. It is important to preserve the total volume as this happens to avoid the il¬ 
lusion of growing or shrinking of the object. The greater the speed of motion (or 
the force), the more stretching (or squashing) is applied. Such deformations are 
used for several reasons. For very fast motion, an object can move between two 
sequential frames so quickly that there is no overlap between the object at the 
time of the current frame and at the time of the previous frame which can lead 
to strobing (a variant of aliasing). Having the object elongated in the direction of 
motion can ensure better overlap and helps the eye to fight this unpleasant effect. 
Stretching/squashing can also be used to show flexibility of the object with more 
deformation applied for more pliable materials. If the object is intended to appear 
as rigid, its shape is purposefully left the same when it moves. 

Natural motion rarely happens along straight lines, so this should generally be 
avoided in animation and arcs should be used instead. Similarly, no real-world 
motion can instantly change its speed —this would require an infinite amount of 
force to be applied to an object. It is desirable to avoid such situations in anima¬ 
tion as well. In particular, the motion should start and end gradually ( slow in and 
out). While hand-drawn animation is sometimes done via straight-ahead action 
with an animator starting at the first frame and drawing one frame after another in 


• / 
V 


Figure 17.2. Clas¬ 

sic example of applying the 
squash and stretch princi¬ 
ple. Note that the volume 
of the bouncing ball should 
remain roughly the same 
throughout the animation. 
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Key frames (created first) 



Straight ahead order of frame creation 


Figure 17.3. Keyfram¬ 
ing (top) encourages de¬ 
tailed action planning while 
straight-ahead action (bot¬ 
tom) leads to a more spon¬ 
taneous result. 


sequence until the end, pose-to-pose action, also known as key framing , is much 
more suitable for computer animation. In this technique, animation is carefully 
planned through a series of relatively sparsely spaced key frames with the rest 
of the animation (in-between frames) filled in only after the keys are set (Fig¬ 
ure 17.3). This allows more precise timing and allows the computer to take over 
the most tedious part of the process—the creation of the in-between frames — 
using algorithms presented in the next section. 

Almost any of the techniques outlined above can be used with some reason¬ 
able amount of exaggeration to achieve greater artistic effect or underscore some 
specific property of an action or a character. The ultimate goal is to achieve some¬ 
thing the audience will want to see, something which is appealing. Extreme com¬ 
plexity or too much symmetry in a character or action tends to be less appealing. 
To create good results, a traditional animator needs solid drawing skills. Analo¬ 
gously, a computer animator should certainly understand computer graphics and 
have a solid knowledge of the tools he uses. 


17.1.4 Animator Control vs. Automatic Methods 

In traditional animation, the animator has complete control over all aspects of the 
production process and nothing prevents the final product to be as it was planned 
in every detail. The price paid for this flexibility is that every frame is created by 
hand, leading to an extremely time- and labor-consuming enterprise. In computer 
animation, there is a clear tradeoff between, on the one hand, giving an animator 
more direct control over the result, but asking him to contribute more work and, 
on the other hand, relying on more automatic techniques which might require 
setting just a few input parameters but offer little or no control over some of the 
properties of the result. A good algorithm should provide sufficient flexibility 
while asking an animator only the information which is intuitive, easy to provide, 
and which he himself feels is necessary for achieving the desired effect. While 
perfect compliance with this requirement is unlikely in practice since it would 
probably take something close to a mind-reading machine, we do encourage the 
reader to evaluate any computer-animation technique from the point of view of 
providing such balance. 

17.2 Keyframing 

The term keyframing can be misleading when applied to 3D computer animation 
since no actual completed frames (i.e., images) are typically involved. At any 
given moment, a 3D scene being animated is specified by a set of numbers: the 
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Figure 17.4. Different patterns of setting keys (black circles above) can be used simultane¬ 
ously for the same scene. It is assumed that there are more frames before as well as after 
this portion. 


positions of centers of all objects, their RGB colors, the amount of scaling applied 
to each object in each axis, modeling transformations between different parts of 
a complex object, camera position and orientation, light sources intensity, etc. To 
animate a scene, some subset of these values have to change with time. One can, 
of course, directly set these values at every frame, but this will not be particularly 
efficient. Short of that, some number of important moments in time (key frames 
tk) can be chosen along the timeline of animation for each of the parameters and 
values of this parameter (key values fk ) are set only for these selected frames. 
We will call a combination (tk, fk) of key frame and key value simply a key. 
Key frames do not have to be the same for different parameters, but it is often 
logical to set keys at least for some of them simultaneously. For example, key 
frames chosen for x-, y- and ^-coordinates of a specific object might be set at 
exactly the same frames forming a single position vector key (tk, Pk)- These key 
frames, however, might be completely different from those chosen for the object’s 
orientation or color. The closer key frames are to each other, the more control the 
animator has over the result; however the cost of doing more work of setting the 
keys has to be assessed. It is, therefore, typical to have large spacing between 
keys in parts of the animation which are relatively simple, concentrating them in 
intervals where complex action occurs as shown in Figure 17.4. 

Once the animator sets the key (tk, fk), the system has to compute values of 
/ for all other frames. Although we are ultimately interested only in a discrete set 
of values, it is convenient to treat this as a classical interpolation problem which 
fits a continuous animation curve f(t) through a provided set of data points (Fig¬ 
ure 17.5). Extensive discussion of curve fitting algorithms can be found in Chap- 



Figure 17.5. A continuous 
curve f(t) is fit through the 
keys provided by the ani¬ 
mator even though only val¬ 
ues at frame positions are 
of interest. The derivative 
of this function gives the 
speed of parameter change 
and is at first determined 
automatically by the fitting 
procedure. 
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ter 15, and we will not repeat it here. Since the animator initially provides only the 
keys and not the derivative (tangent), methods which compute all necessary infor¬ 
mation directly from keys are preferable for animation. The speed of parameter 
change along the curve is given by the derivative of the curve with respect to time 
df /dt. Therefore, to avoid sudden jumps in velocity, C 1 continuity is typically 
necessary. A higher degree of continuity is typically not required from animation 
curves, since the second derivative, which corresponds to acceleration or applied 
force, can experience very sudden changes in real-world situations (ball hitting a 
solid wall), and higher derivatives do not directly correspond to any parameters of 
physical motion. These consideration make Catmull-Rom splines one of the best 
choices for initial animation curve creation. 

Most animation systems give the animator the ability to perform interactive 
fine editing of this initial curve, including inserting more keys, adjusting existing 
keys, or modifying automatically computed tangents. Another useful technique 
which can help to tweak the shape of the curve is called TCB control (TCB stands 
for tension, continuity and bias). The idea is to introduce three new parameters 
which can be used to modify the shape of the curve near a key through coordinated 
adjustment of incoming and outgoing tangents at this point. For keys uniformly 
spaced in time with distance At between them, the standard Catmull-Rom ex¬ 
pression for incoming 7j m and outgoing T° ut tangents at an internal key (tk, fk) 
can be rewritten as 

T in = T out = _l _( /fc+1 _ fk) + 

Modified tangents of a TCB spline are 


rin i 1 ~ tfi 1 ~ + b) , , (l-f)(l + c)(l-6) /J . f , 

L k — -- Uk+l-Jk)-\ -- \Jk — Jk-l), 


2 At 


rout (! — *)(! + c)(l + b) ^ (l-f)(l-c)(l-6) 

L k — - 777T+ - (Jk+l-Jk)-\ -77"7T- Uk — Jk-l)- 


2A t 


2 At 


The tension parameter t controls the sharpness of the curve near the key by scaling 
both incoming and outgoing tangents. Larger tangents (lower tension) lead to a 
flatter curve shape near the key. Bias b allows the animator to selectively increase 
the weight of a key’s neighbors locally pulling the curve closer to a straight line 
connecting the key with its left (b near 1, “overshooting” the action) or right (b 
near —1, “undershooting” the action) neighbors. A non-zero value of continuity c 
makes incoming and outgoing tangents different allowing the animator to create 
kinks in the curve at the key value. Practically useful values of TCB parameters 
are typically confined to the interval [—1; 1] with defaults t = c = b = 0 corre¬ 
sponding to the original Catmull-Rom spline. Examples of possible curve shape 
adjustments are shown in Figure 17.6. 
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i low tension, t<0 , 

i original spline, t=c=b=0 

i high tension, t>0 


1 low continuity, c<0 

1 original spline, t=c=b=0 

; high continuity, c>0 


t low bias, b<0 ; 

, original spline, t=c=b=0 ; 

i high bias, b>0 





Figure 17.6. Editing the default interpolating spline (middle column) using TCB controls. 
Note that all keys remain at the same positions. 


17.2.1 Motion Controls 

So far, we have described how to control the shape of the animation curve through 
key positioning and fine tweaking of tangent values at the keys. This, however, 
is generally not sufficient when one would like to have control both over where 
the object is moving, i.e., its path, and how fast it moves along this path. Given a 
set of positions in space as keys, automatic curve-fitting techniques can fit a curve 
through them, but resulting motion is only constrained by forcing the object to 
arrive at a specified key position pt at the corresponding key frame It , and noth¬ 
ing is directly said about the speed of motion between the keys. This can create 
problems. For example, if an object moves along the tc-axis with velocity 11 me¬ 
ters per second for 1 second and then with 1 meter per second for 9 seconds, it 
will arrive at position x = 20 after 10 seconds thus satisfying animator’s keys 
(0,0) and (10, 20). It is rather unlikely that this jerky motion was actually de¬ 
sired, and uniform motion with speed 2 meters/second is probably closer to what 
the animator wanted when setting these keys. Although typically not displaying 
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Figure 17.7. All three 
motions are along the same 
2D path and satisfy the set 
of keys at the tips of the 
black triangles. The tips of 
the white triangles show ob¬ 
ject position at Af = 1 in¬ 
tervals. Uniform speed of 
motion between the keys 
(top) might be closer to 
what the animator wanted 
but automatic fitting proce¬ 
dures could result in either 
of the other two motions. 


such extreme behavior, polynomial curves resulting from standard fitting proce¬ 
dures do exhibit non-uniform speed of motion between keys as demonstrated in 
Figure 17.7. While this can be tolerable (within limits) for some parameters for 
which the human visual system is not very good at determining non-uniformities 
in the rate of change (such as color or even rate of rotation), we have to do bet¬ 
ter for position p of the object where velocity directly corresponds to everyday 
experience. 

We will first distinguish curve parameterization used during the fitting proce¬ 
dure from that used for animation. When a curve is fit through position keys, we 
will write the result as a function p(u) of some parameter u. This will describe 
the geometry of the curve in space. The arc length s is the physical length of the 
curve. A natural way for the animator to control the motion along the now existing 
curve is to specify an extra function s{t) which corresponds to how far along the 
curve the object should be at any given time. To get an actual position in space, 
we need one more auxiliary function u(s) which computes a parameter value u 
for given arc length s. The complete process of computing an object position for 
a given time t is then given by composing these functions (see Figurel7.8): 

P (t) = P {u(s(t))). 

Several standard functions can be used as the distance-time function s(f). 
One of the simplest is the linear function corresponding to constant velocity: 
s(f) = vt with v = const. Another common example is the motion with con¬ 
stant acceleration a (and initial speed vo) which is described by the parabolic 
s(t) = vot + at 2 /2. Since velocity is changing gradually here, this function 
can help to model desirable ease-in and ease-out behavior. More generally, the 



Figure 17.8. To get position in space at a given time f, one first utilizes user-specified motion 
control to obtain the distance along the curve s(f) and then compute the corresponding curve 
parameter value u(s(f)). Previously fitted curve P(u) can now be used to find the position 
P( 0 (s( f ))). 
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slope of s(t) gives the velocity of motion with negative slope corresponding to 
the motion backwards along the curve. To achieve most flexibility, the ability to 
interactively edit s(t) is typically provided to the animator by the animation sys¬ 
tem. The distance-time function is not the only way to control motion. In some 
cases it might be more convenient for the user to specify a velocity-time function 
v(t) or even an acceleration-time function a(t). Since these are correspondingly 
first and second derivatives of s(t), to use these type of controls, the system first 
recovers the distance-time function by integrating the user input (twice in the case 
of a(t)). 

The relationship between the curve parameter u and arc length s is established 
automatically by the system. In practice, the system first determines arc length 
dependance on parameter u (i.e., the inverse function s(u)). Using this function, 
for any given S it is possible to solve the equation s(u) — S = 0 with unknown u 
obtaining u(S ). For most curves, the function s(u) can not be expressed in closed 
analytic form and numerical integration is necessary (see Chapter 14). Standard 
numerical root-finding procedures (such as the Newton-Raphson method, for ex¬ 
ample) can then be directly used to solve the equation s(u ) — S = 0 for u. 

An alternative technique is to approximate the curve itself as a set of linear 
segments between points p, computed at some set of sufficiently densely spaced 
parameter values it,. One then creates a table of approximate arc lengths 



Since s(u) is a non-decreasing function of u, one can then find the interval con¬ 
taining the value Sby simple searching through the table (see Figurel7.9). Linear 
interpolation of the interval’s u end values is then performed to finally find u(S). 
If greater precision is necessary, a few steps of the Newton-Raphson algorithm 
with this value as the starting point can be applied. 

17.2.2 Interpolating Rotation 

The techniques presented above can be used to interpolate the keys set for most of 
the parameters describing the scene. Three-dimensional rotation is one important 
motion for which more specialized interpolation methods and representations are 
common. The reason for this is that applying standard techniques to 3D rotations 
often leads to serious practical problems. Rotation (a change in orientation of an 
object) is the only motion other than translation which leaves the shape of the 
object intact. It therefore plays a special role in animating rigid objects. 



Figure 17.9. To cre¬ 
ate a tabular version of 
s(u), the curve can be ap¬ 
proximated by a number 
of line segments connect¬ 
ing points on the curve po¬ 
sitioned at equal parame¬ 
ter increments. The table 
is searched to find the u- 
interval for a given S. For 
the curve above, for exam¬ 
ple, the value of u corre¬ 
sponding to the position of 
S = 6.5 lies between u = 0.6 
and u = 0.8. 
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Original configuration 



Gimbal lock configuration 


Figure 17.11. In this 
example, gimbal lock oc¬ 
curs when a 90 degree 
turn around axis Z is made. 
Both X and Y rotations are 
now performed around the 
same axis leading to the 
loss of one degree of free¬ 
dom. 



Figure 17.10. Three Euler angles can be used to specify arbitrary object orientation through 
a sequence of three rotations around coordinate axes embedded into the object (axis Y 
always points to the tip of the cone). Note that each rotation is given in a new coordinate 
system. Fixed angle representation is very similar but the coordinate axes it uses are fixed 
in space and do not rotate with the object. 


There are several ways to specify the orientation of an object. First, trans¬ 
formation matrices as described in Chapter 6 can be used. Unfortunately, naive 
(element-by-element) interpolation of rotation matrices does not produce a correct 
result. For example, the matrix “half-way” between 2D clock- and counterclock¬ 
wise 90 degree rotation is the null matrix: 
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' 0 1 ' 
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- 1 ' 


'0 
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1 
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0 
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The correct result is, of course, the unit matrix corresponding to no rotation. Sec¬ 
ond, one can specify arbitrary orientation as a sequence of exactly three rotations 
around coordinate axes chosen in some specific order. These axes can be fixed in 
space (fixed-angle representation) or embedded into the object therefore changing 
after each rotation ( Euler-angle representation as shown in Figure 17.10). These 
three angles of rotation can be animated directly through standard keyframing, 
but a subtle problem known as gimbal lock arises. Gimbal lock occurs if dur¬ 
ing rotation one of the three rotation axes is by accident aligned with another, 
thereby reducing by one the number of available degrees of freedom as shown in 
Figure 17.11 for a physical device. This effect is more common than one might 
think—a single 90 degree turn to the right (or left) can potentially put an object 
into a gimbal lock. Finally, any orientation can be specified by choosing an appro¬ 
priate axis in space and angle of rotation around this axis. While animating in this 
representation is relatively straightforward, combining two rotations, i.e., finding 
the axis and angle corresponding to a sequence of two rotations both represented 
by axis and angle, is non-trivial. A special mathematical apparatus, quaternions 
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has been developed to make this representation suitable both for combining sev¬ 
eral rotations into a single one and for animation. 

Given a 3D vector v = (x, y, z ) and a scalar s, a quaternion q is formed by 
combining the two into a four component object: q = [s x y z] = [s; v]. Several 
new operations are then defined for quaternions. Quaternion addition simply sums 
scalar and vector parts separately: 

qi + q 2 = [ Sl + S2 ; vi + v 2 ]. 


Multiplication by a scalar a gives a new quaternion 

aq = [as; av]. 

More complex quaternion multiplication is defined as 


qi ■ 92 = [sis 2 - viv 2 ; siv 2 + s 2 Vi + Vi x v 2 ], 

where x denotes a vector cross product. It is easy to see that, similar to matrices, 
quaternion multiplication is associative, but not commutative. We will be inter¬ 
ested mostly in normalized quaternions—those for which the quaternion norm 
\q\ = vV + v 2 is equal to one. One final definition we need is that of an inverse 
quaternion: 

<T 1 = (1/M)[«; -v]. 

To represent a rotation by angle <j> around an axis passing through the origin 
whose direction is given by the normalized vector n, a normalized quaternion 

q = [cos(<£/2); sin(<£/2)n] 

is formed. To rotate point p, one turns it into the quaternion q p = [0; p] and 
computes the quaternion product 

% = Q ■ % ' <r l 

which is guaranteed to have a zero scalar part and the rotated point as its vector 
part. Composite rotation is given simply by the product of quaternions represent¬ 
ing each of the separate rotation steps. To animate with quaternions, one can treat 
them as points in a four-dimensional space and set keys directly in this space. To 
keep quaternions normalized, one should, strictly speaking, restrict interpolation 
procedures to a unit sphere (a 3D object) in this 4D space. However, a spherical 
version of even linear interpolation (often called slerp) already results in rather 
unpleasant math. Simple 4D linear interpolation followed by projection onto the 
unit sphere shown in Figure 17.12 is much simpler and often sufficient in practice. 
Smoother results can be obtained via repeated application of a linear interpolation 
procedure using the de Casteljau algorithm. 



polating quaternions should 
be done on the surface of 
a 3D unit sphere embed¬ 
ded in 4D space. How¬ 
ever, much simpler interpo¬ 
lation along a 4D straight 
line (open circles) followed 
by re-projection of the re¬ 
sults onto the sphere (black 
circles) is often sufficient. 
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Original shape Bend 



Figure 17.13. Popular ex¬ 
amples of global deforma¬ 
tions. Bending and twist an¬ 
gles as well as the degree 
of taper can all be animated 
to achieve dynamic shape 
change. 


P(s,u,t) 


17.3 Deformations 


Although techniques for object deformation might be more properly treated as 
modeling tools, they are traditionally discussed together with animation methods. 
Probably the simplest example of an operation which changes object shape is a 
non-uniform scaling. More generally, some function can be applied to local co¬ 
ordinates of all points specifying the object (i.e., vertices of a triangular mesh 
or control polygon of a spline surface), repositioning these points and creating a 
new shape: p' = /( p, 7 ) where 7 is a vector of parameters used by the deforma¬ 
tion function. Choosing different / (and combining them by applying one after 
another) can help to create very interesting deformations. Examples of useful 
simple functions include bend, twist, and taper which are shown in Figure 17.13. 
Animating shape change is very easy in this case by keyframing the parameters 
of the deformation function. Disadvantages of this technique include difficulty of 
choosing the mathematical function for some non-standard deformations and the 
fact that the resulting deformation is global in the sense that the complete object, 
and not just some part of it, is reshaped. 

To deform an object locally while providing more direct control over the re¬ 
sult, one can choose a single vertex, move it to a new location and adjust vertices 
within some neighborhood to follow the seed vertex. The area affected by the de¬ 
formation and the specific amount of displacement in different parts of the object 
are controlled by an attenuation function which decreases with distance (typically 
computed over the object’s surface) to the seed vertex. Seed vertex motion can be 
keyframed to produce animated shape change. 

A more general deformation technique is called free-form deformation (FFD) 
(Sederberg & Parry, 1986). A local (in most cases rectilinear) coordinate grid 
is first established to encapsulate the part of the object to be deformed, and co¬ 
ordinates ( s,t,u ) of all relevant points are computed with respect to this grid. 
The user then freely reshapes the grid of lattice points Pyj. into a new distorted 
lattice Pf k (Figure 17.14). The object is reconstructed using coordinates com¬ 
puted in the original undistorted grid in the trivariate analog of Bezier interpolants 
(see Chapter 15) with distorted lattice points Pk k serving as control points in this 
expression: 
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where L, M, N are maximum indices of lattice points in each dimension. In ef¬ 
fect, the lattice serves as a low resolution version of the object for the purpose of 
deformation, allowing for a smooth shape change of an arbitrarily complex ob- 
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ject through a relatively small number of intuitive adjustments. FFD lattices can 
themselves be treated as regular objects by the system and can be transformed, an¬ 
imated, and even further deformed if necessary, leading to corresponding changes 
in the object to which the lattice is attached. For example, moving a deforma¬ 
tion tool consisting of the original lattice and distorted lattice representing a bulge 
across an object results in a bulge moving across the object. 


17.4 Character Animation 

Animation of articulated figures is most often performed through a combination 
of keyframing and specialized deformation techniques. The character model in¬ 
tended for animation typically consists of at least two main layers as shown in 
Figure 17.15. The motion of a highly detailed surface representing the outer shell 
or skin of the character is what the viewer will eventually see in the final prod¬ 
uct. The skeleton underneath it is a hierarchical structure (a tree) of joints which 
provides a kinematic model of the figure and is used exclusively for animation. 
In some cases, additional intermediate layer(s) roughly corresponding to muscles 
are inserted between the skeleton and the skin. 



^ pelvis 


- ^ spine 


collar 




- ^ rshoulder 
^— i. elbow 


— ^ Ishoulder 
rhip 

I— ^ knee 


ball 


t, Ibip 


Figure 17.15. (Left) A hierarchy of joints, a skeleton, serves as a kinematic abstraction of 
the character; (middle) repositioning the skeleton deforms a separate skin object attached 
to it; (right) a tree data structure is used to represent the skeleton. For compactness, the 
internal structure of several nodes is hidden (they are identical to a corresponding sibling). 



Figure 17.14. Adjusting 
the FFD lattice results in the 
deformation of the object. 
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Each of the skeleton’s joints acts as a parent for the hierarchy below it. The 
root represents the whole character and is positioned directly in the world coor¬ 
dinate system. If a local transformation matrix which relates a joint to its parent 
in the hierarchy is available, one can obtain a transformation which relates local 
space of any joint to the world system (i.e., the system of the root) by simply con¬ 
catenating transformations along the path from the root to the joint. To evaluate 
the whole skeleton (i.e., find position and orientation of all joints), a depth-first 
traversal of the complete tree of joints is performed. A transformation stack is a 
natural data structure to help with this task. While traversing down the tree, the 
current composite matrix is pushed on the stack and new one is created by mul¬ 
tiplying the current matrix with the one stored at the joint. When backtracking 
to the parent, this extra transformation should be undone before another branch is 
visited; this is easily done by simply popping the stack. Although this general and 
simple technique for evaluating hierarchies is used throughout computer graphics, 
in animation (and robotics) it is given a special name —forward kinematics (FK). 
While general representations for all transformations can be used, it is common to 
use specialized sets of parameters, such as link lengths or joint angles, to specify 
skeletons. To animate with forward kinematics, rotational parameters of all joints 
are manipulated directly. The technique also allows the animator to change the 
distance between joints (link lengths), but one should be aware that this corre¬ 
sponds to limb stretching and can often look rather unnatural. 

Forward kinematics requires the user to set parameters for all joints involved 
in the motion (Figure 17.16 (top)). Most of these joints, however, belong to in- 



Figure 17.16. Forward kinematics (top) requires the animator to put all joints into correct 
position. In inverse kinematic (bottom), parameters of some internal joints are computed 
based on desired end effector motion. 
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ternal nodes of the hierarchy, and their motion is typically not something the 
animator wants to worry about. In most situations, the animator just wants them 
to move naturally “on their own,” and one is much more interested in specify¬ 
ing the behavior of the end point of a joint chain, which typically corresponds to 
something performing a specific action, such as an ankle or a tip of a finger. The 
animator would rather have parameters of all internal joints be determined from 
the motion of the end effector automatically by the system. Inverse kinematics 
(IK) allows us to do just that (see Figure 17.16 (bottom)). 

Let x be the position of the end effector and a be the vector of parameters 
needed to specify all internal joints along the chain from the root to the final joint. 
Sometimes the orientation of the final joint is also directly set by the animator, in 
which case we assume that the corresponding variables are included in the vector 
x. For simplicity, however, we will write all specific expressions for the vector: 

x = (xi,x 2 ,x 3 ) t . 

Since each of the variables in x is a function of a, it can be written as a vector 
equation x = F(a) . If we change the internal joint parameters by a small amount 
6a, a resulting change Sx in the position of the end effector can be approximately 
written as 

dF 

6x = ——<5a, (17.1) 

da 

A 

where is the matrix of partial derivatives called the Jacobian: 

dF 

da 

At each moment in time, we know the desired position of the end effector (set by 
the animator) and, of course, the effector’s current position. Subtracting the two, 
we will get the desired adjustment Sx. Elements of the Jacobian matrix are related 
to changes in a coordinate of the end effector when a particular internal parameter 
is changed while others remain fixed (see Figure 17.17). These elements can 
be computed for any given skeleton configuration using geometric relationships. 
The only remaining unknowns in the system of equations (17.1) are the changes in 
internal parameters a. Once we solve for them, we update a = a+5a which gives 
all the necessary information for the FK procedure to reposition the skeleton. 

Unfortunately, the system (17.1) can not usually be solved analytically and, 
moreover, it is in most cases underconstrained, i.e., the number of unknown inter¬ 
nal joint parameters a exceeds the number of variables in vector x. This means 
that different motions of the skeleton can result in the same motion of the end 
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Figure 17.17. Partial 

derivative 9x/9a knee is 
given by the limit of 
Ax/Ao knee . Effector dis¬ 
placement is computed 
while all joints, except the 
knee, are kept fixed. 
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Figure 17.18. Mul¬ 

tiple configurations of in¬ 
ternal joints can result in 
the same effector position. 
(Top) disjoint “flipped” solu¬ 
tions; (bottom) a continuum 
of solutions. 




Figure 17.20. Top: 

Rigid skinning assigns skin 
vertices to a specific joint. 
Those belonging to the el¬ 
bow joint are shown in 
black; Bottom: Soft skin¬ 
ning can blend the in¬ 
fluence of several joints. 
Weights for the elbow joint 
are shown (lighter = greater 
weight). Note smoother 
skin deformation of the in¬ 
ner part of the skin near the 
joint. 


effector. Some examples are shown on Figure 17.18. Many ways of obtaining 
specific solution for such systems are available, including those taking into ac¬ 
count natural constraints needed for some real-life joints (bending a knee only in 
one direction, for example). One should also remember that the computed Jaco¬ 
bian matrix is valid only for one specific configuration, and it has to be updated as 
the skeleton moves. The complete IK framework is presented in Figure 17.19. Of 
course, the root joint for IK does not have to be the root of the whole hierarchy, 
and multiple IK solvers can be applied to independent parts of the skeleton. For 
example, one can use separate solvers for right and left feet and yet another one 
to help animate grasping with the right hand, each with its own root. 

A combination of FK 
and IK approaches is typ¬ 
ically used to animate the 
skeleton. Many com¬ 
mon motions (walking or 
running cycles, grasping, 
reaching, etc.) exhibit well- 
known patterns of mutual 
joint motion making it pos¬ 
sible to quickly create nat¬ 
urally looking motion or 
even use a library of such 
“clips.” The animator then 
adjusts this generic result 
according to the physical 
parameters of the character 
and also to give it more in¬ 
dividuality. 

When a skeleton changes its position, it acts as a special type of deformer 
applied to the skin of the character. The motion is transferred to this surface by 
assigning each skin vertex one (rigid skinning) or more (smooth skinning) joints 
as drivers (see Figure 17.20). In the first case, a skin vertex is simply frozen 
into the local space of the corresponding joint, which can be the one nearest in 
space or one chosen directly by the user. The vertex then repeats whatever mo¬ 
tion this joint experiences, and its position in world coordinates is determined by 
standard FK procedure. Although it is simple, rigid skinning makes it difficult 
to obtain sufficiently smooth skin deformation in areas near the joints or also for 
more subtle effects resembling breathing or muscle action. Additional specialized 
deformers called flexors can be used for this purpose. In smooth skinning, several 
joints can influence a skin vertex according to some weight assigned by the ani- 



Figure 17.19. A diagram of the inverse kinematic 
algorithm. 
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mator, providing more detailed control over the results. Displacement vectors, di, 
suggested by different joints affecting a given skin vertex (each again computed 
with standard FK) are averaged according to their weights wt to compute the fi¬ 
nal displacement of the vertex d = ^Wid,. Normalized weights (= 1) 
are the most common but not fundamentally necessary. Setting smooth skinning 
weights to achieve the desired effect is not easy and requires significant skill from 
the animator. 


17.4.1 Facial Animation 

Skeletons are well suited for creating most motions of a character’s body, but they 
are not very convenient for realistic facial animation. The reason is that the skin 
of a human face is moved by muscles directly attached to it contrary to other parts 
of the body where the primary objective of the muscles is to move the bones of 
the skeleton and any skin deformation is a secondary outcome. The result of this 
facial anatomical arrangement is a very rich set of dynamic facial expressions 
humans use as one of the main instruments of communication. We are all very 
well trained to recognize such facial variations and can easily notice any unnatural 
appearance. This not only puts special demands on the animator but also requires 
a high-resolution geometric model of the face and, if photorealism is desired, 
accurate skin reflection properties and textures. 

While it is possible to set key poses of the face vertex-by-vertex and inter¬ 
polate between them or directly simulate the behavior of the underlying muscle 
structure using physics-based techniques (see Section 17.5 below), more special¬ 
ized high-level approaches also exist. The static shape of a specific face can be 
characterized by a relatively small set of so-called conformational parameters 
(overall scale, distance from the eye to the forehead, length of the nose, width of 
the jaws, etc.) which are used to morph a generic face model into one with individ¬ 
ual features. An additional set of expressive parameters can be used to describe 
the dynamic shape of the face for animation. Examples include rigid rotation of 
the head, how wide the eyes are open, movement of some feature point from its 
static position, etc. These are chosen so that most of the interesting expressions 
can be obtained through some combination of parameter adjustments, therefore, 
allowing a face to be animated via standard keyframing. To achieve a higher level 
of control, one can use expressive parameters to create a set of expressions corre¬ 
sponding to common emotions (neutral, sadness, happiness, anger, surprise, etc.) 
and then blend these key poses to obtain a “slightly sad” or “angrily surprised” 
face. Similar techniques can be used to perform lip-synch animation, but key 
poses in this case correspond to different phonemes. Instead of using a sequence 
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of static expressions to describe a dynamic one, the Facial Action Coding Sys¬ 
tem (FACS) (Eckman & Friesen, 1978) decomposes dynamic facial expressions 
directly into a sum of elementary motions called action units (AUs). The set of 
AUs is based on extensive psychological research and includes such movements 
as raising the inner brow, wrinkling the nose, stretching lips, etc. Combining AUs 
can be used to synthesize a necessary expression. 


17.4.2 Motion Capture 



Figure 17.21. Optical 
motion capture: markers 
attached to a performer's 
body allow skeletal motion 
to be extracted. Image 
courtesy of Motion Analysis 
Corp. 


Even with the help of the techniques described above, creating realistic-looking 
character animation from scratch remains a daunting task. It is therefore only 
natural that much attention is directed towards techniques which record an actor's 
motion in the real world and then apply it to computer-generated characters. Two 
main classes of such motion capture (MC) techniques exist: electromagnetic and 
optical. 

In electromagnetic motion capture, an electromagnetic sensor directly mea¬ 
sures its position (and possibly orientation) in 3D often providing the captured 
results in real time. Disadvantages of this technique include significant equip¬ 
ment cost, possible interference from nearby metal objects, and noticeable size 
of sensors and batteries which can be an obstacle in performing high-amplitude 
motions. In optical MC, small colored markers are used instead of active sensors 
making it a much less intrusive procedure. Figure 17.21 shows the operation of 
such a system. In the most basic arrangement, the motion is recorded by two cali¬ 
brated video cameras, and simple triangulation is used to extract the marker's 3D 
position. More advanced computer vision algorithms used for accurate tracking 
of multiple markers from video are computationally expensive, so, in most cases, 
such processing is done offline. Optical tracking is generally less robust than 
electromagnetic. Occlusion of a given marker in some frames, possible misiden- 
tification of markers, and noise in images are just a few of the common problem 
which have to be addressed. Introducing more cameras observing the motion from 
different directions improves both accuracy and robustness, but this approach is 
more expensive and it takes longer to process such data. Optical MC becomes 
more attractive as available computational power increases and better computer 
vision algorithms are developed. Because of low impact nature of markers, opti¬ 
cal methods are suitable for delicate facial motion capture and can also be used 
with objects other than humans—for example, animals or even tree branches in 
the wind. 

With several sensors or markers attached to a performer’s body, a set of time- 
dependant 3D positions of some collection of points can be recorded. These track- 
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ing locations are commonly chosen near joints, but, of course, they still lie on skin 
surface and not at points where actual bones meet. Therefore, some additional 
care and a bit of extra processing is necessary to convert recorded positions into 
those of the physical skeleton joints. For example, putting two markers on oppo¬ 
site sides of the elbow or ankle allows the system to obtain better joint position 
by averaging locations of the two markers. Without such extra care, very notice¬ 
able artifacts can appear due to offset joint positions as well as inherent noise 
and insufficient measurement accuracy. Because of physical inaccuracy during 
motion, for example, character limbs can loose contact with objects they are sup¬ 
posed to touch during walking or grasping, problems like foot-sliding (skating) 
of the skeleton can occur. Most of these problems can be corrected by using in¬ 
verse kinematics techniques which can explicitly force the required behavior of 
the limb’s end. 

Recovered joint positions can now be directly applied to the skeleton of a 
computer-generated character. This procedure assumes that the physical dimen¬ 
sions of the character are identical to those of the performer. Retargeting recorded 
motion to a different character and, more generally, editing MC data, requires 
significant care to satisfy necessary constraints (such as maintaining feet on the 
ground or not allowing an elbow to bend backwards) and preserve an overall nat¬ 
ural appearance of the modified motion. Generally, the greater the desired change 
from the original, the less likely it will be possible to maintain the quality of the 
result. An interesting approach to the problem is to record a large collection of 
motions and stich together short clips from this library to obtain desired move¬ 
ment. Although this topic is currently a very active research area, limited ability 
to adjust the recorded motion to the animator’s needs remains one of the main 
disadvantages of motion capture technique. 


17.5 Physics-Based Animation 

The world around us is governed by physical laws many of which can be formal¬ 
ized as sets of partial or, in some simpler cases, ordinary differential equations. 
One of the original applications of computers was (and remains) solving such 
equations. It is therefore only natural to attempt to use numerical techniques 
developed over the several past decades to obtain realistic motion for computer 
animation. 

Because of its relative complexity and significant cost, physics-based anima¬ 
tion is most commonly used in situations when other techniques are either un¬ 
available or do not produce sufficiently realistic results. Prime examples include 
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Figure 17.22. Real¬ 
istic cloth simulation is of¬ 
ten performed with physics- 
based methods. In this ex¬ 
ample, forces are due to 
collisions and gravity. 


animation of fluids (which includes many gaseous phase phenomena described 
by the same equations—smoke, clouds, fire, etc.), cloth simulation (an exam¬ 
ple is shown in Figure 17.22), rigid body motion, and accurate deformation of 
elastic objects. Governing equations and details of commonly used numerical 
approaches are different in each of these cases, but many fundamental ideas and 
difficulties remain applicable across applications. Many methods for numerically 
solving ODEs and PDEs exist but discussing them in details is far beyond the 
scope of this book. To give the reader a flavor of physics-based techniques and 
some of the issues involved, we will briefly mention here only the finite differ¬ 
ence approach—one of the conceptually simplest and most popular families of 
algorithms which has been applied to most, if not all, differential equations en¬ 
countered in animation. 

The key idea of this approach is to replace a differential equation with its dis¬ 
crete analog—a difference equation. To do this, the continuous domain of interest 
is represented by a finite set of points at which the solution will be computed. In 
the simplest case, these are defined on a uniform rectangular grid as shown in Fig¬ 
ure 17.23. Every derivative present in the original ODE or PDE is then replaced 
by its approximation through function values at grid points. One way of doing 
this is to subtract the function value at a given point from the function value for 
its neighboring point on the grid: 


df(t) _ A/ = f(t + At) - f(t ) or df(x,t) _ A/ = f(x + Ax,t) - f(x,t) 
dt At At dx Ax Ax 

(17.2) 



Figure 17.23. Two possible difference schemes for an equation involving derivatives df/dx 
and df/dt. (Left) An explicit scheme expresses unknown values (open circles) only through 
known values at the current (black circles) and possibly past (gray circles) time; (Right) Im¬ 
plicit schemes mix known and unknown values in a single equation making it necessary to 
solve all such equations as a system. For both schemes, information about values on the 
right boundary is needed to close the process. 
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These expressions are, of course, not the only way. One can, for example, use 
fit — At) instead of f(t) above and divide by 2At. For an equation containing 
a time derivative, it is now possible to propagate values of an unknown function 
forward in time in a sequence of Af-size steps by solving the system of difference 
equations (one at each spatial location) for unknown f(t + At). Some initial 
conditions, i.e., values of the unknown function at t = 0, are necessary to start the 
process. Other information, such as values on the boundary of the domain, might 
also be required depending on the specific problem. 

The computation of f(t+At) can be done easily for so called explicit schemes 
when all other values present are taken at the current time and the only unknown 
in the corresponding difference equation f(t + At) is expressed through these 
known values. Implicit schemes mix values at current and future times and might 
use, for example, 

f{x + Ax, t + At) — f{x, t + At) 

Ax 

as an approximation of ^. In this case one has to solve a system of algebraic 
equations at each step. 

The choice of difference scheme can dramatically affect all aspects of the 
algorithm. The most obvious among them is accuracy. In the limit At —> 0 
or Ax —> 0, expressions of the type in Equation (17.2) are exact, but for finite 
step size some schemes allow better approximation of the derivative than others. 
Stability of a difference scheme is related to how fast numerical errors, which are 
always present in practice, can grow with time. For stable schemes this growth is 
bounded, while for unstable ones it is exponential and can quickly overwhelm the 
solution one seeks (see Figure 17.24). It is important to realize that while some 
inaccuracy in the solution is tolerable (and, in fact, accuracy demanded in physics 
and engineering is rarely needed for animation), an unstable result is completely 
meaningless, and one should avoid using unstable schemes. Generally, explicit 
schemes are either unstable or can become unstable at larger step sizes while 
implicit ones are unconditionally stable. Implicit schemes allows greater step size 
(and, therefore, fewer steps) which is why they are popular despite the need to 
solve a system of algebraic equations at each step. Explicit schemes are attractive 
because of their simplicity if their stability conditions can be satisfied. Developing 
a good difference scheme and corresponding algorithm for a specific problem is 
not easy, and for most standard situations it is well advised to use an existing 
method. Ample literature discussing details of these techniques is available. 

One should remember that, in many cases, just computing all necessary terms 
in the equation is a difficult and time-consuming task on its own. In rigid body 
or cloth simulation, for example, most of the forces acting on the system are due 



Figure 17.24. An unsta¬ 
ble solution might follow the 
exact one initially, but can 
deviate arbitrarily far from it 
with time. Accuracy of a 
stable solution might still be 
insufficient for a specific ap¬ 
plication. 
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to collisions among objects. At each step during animation, one therefore has to 
solve a purely geometric, but very non-trivial, problem of collision detection. In 
such conditions, schemes which require fewer evaluations of such forces might 
provide significant computational savings. 

Although the result of solving appropriate time-dependant equations gives 
very realistic motion, this approach has its limitations. First of all, it is very 
hard to control the result of physics-based animation. Fundamental mathematical 
properties of these equations state that once the initial conditions are set, the solu¬ 
tion is uniquely defined. This does not leave much room for animator input and, if 
the result is not satisfactory for some reason, one has only a few options. They are 
mostly limited to adjusting initial condition used, changing physical properties of 
the system, or even modifying the equations themselves by introducing artificial 
terms intended to “drive” the solution in the direction the animator wants. Making 
such changes requires significant skill as well as understanding of the underlying 
physics and, ideally, numerical methods. Without this knowledge, the realism 
provided by physics-based animation can be destroyed or severe numerical prob¬ 
lems might appear. 


17.6 Procedural Techniques 

Imagine that one could write (and implement on a computer) a mathematical func¬ 
tion which outputs precisely the desired motion given some animator guidance. 
Physics-based techniques outlined above can be treated as a special case of such 
an approach when the “function” involved is the procedure to solve a particular 
differential equation and “guidance” is the set of initial and boundary conditions, 
extra equation terms, etc. 

However, if we are only concerned with the final result, we do not have to 
follow a physics-based approach. For example, a simple constant amplitude 
wave on the surface of a lake can be directly created by applying the function 
/(x, f) = Acos(u;t — kx + (j)) with constant frequency oj, wave vector k and 
phase (j) to get displacement at the 2D point x at time t. A collection of such 
waves with random phases and appropriately chosen amplitudes, frequencies, and 
wave vectors can result in a very realistic animation of the surface of water with¬ 
out explicitly solving any fluid dynamics equations. It turns out that other rather 
simple mathematical functions can also create very interesting patterns or objects. 
Several such functions, most based on lattice noises, have been described in Chap¬ 
ter 11. Adding time dependance to these functions allows us to animate certain 
complex phenomena much easier and cheaper than with physics-based techniques 
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while maintaining very high visual quality of the results. If nozse(x) is the un¬ 
derlying pattern-generating function, one can create a time-dependant variant of it 
by moving the argument position through the lattice. The simplest case is motion 
with constant speed: timenoise(x, t) =noise(x + vf), but more complex motion 
through the lattice is, of course, also possible and, in fact, more common. One 
such path, a spiral, is shown in Figure 17.25. Another approach is to animate pa¬ 
rameters used to generate the noise function. This is especially appropriate if the 
appearance changes significantly with time—a cloud becoming more turbulent, 
for example. In this way one can animate the dynamic process of formation of 
clouds using the function which generates static ones. 

For some procedural techniques, time dependance is a more integral compo¬ 
nent. The simplest cellular automata operate on a 2D rectangular grid where 
a binary value is stored at each location (cell). To create a time varying pat¬ 
tern, some user-provided rules for modifying these values are repeatedly applied. 
Rules typically involve some set of conditions on the current value and that of 
the cell's neighbors. For example, the rules of the popular 2D Game of Life cel¬ 
lular automaton invented in 1970 by British mathematician John Conway are the 
following: 

1. A dead cell (i.e., binary value at a given location is 0) with exactly three 
live neighbors becomes a live cell (i.e., its value set to 1). 

2. A live cell with two or three live neighbors stays alive. 

3. In all other cases, a cell dies or remains dead. 

Once the rules are applied to all grid locations, a new pattern is created and a 
new evolution cycle can be started. Three sample snapshots of the live cell distri¬ 
bution at different times are shown in Figure 17.26. More sophisticated automata 


Figure 17.26. Several (non-consecutive) stages in the evolution of a Game of Life automa¬ 
ton. Live cells are shown in black. Stable objects, oscillators, travelling patterns, and many 
other interesting constructions can result from the application of very simple rules. Figure 
created using a program by Alan Hensel. 



Figure 17.25. A path 
through the cube defin¬ 
ing procedural noise is tra¬ 
versed to animate the re¬ 
sulting pattern. 
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S-► A 



Figure 17.27. Con¬ 

secutive derivation steps 
using a simple L-system. 
Capital letters denote 
non-terminals and illustrate 
positions at which corre¬ 
sponding non-terminal will 
be expanded. They are not 
part of the actual output. 


simultaneously operate on several 3D grids of possibly floating point values and 
can be used for modeling dynamics of clouds and other gaseous phenomena or 
biological systems for which this apparatus was originally invented (note the ter¬ 
minology). Surprising pattern complexity can arise from just a few well-chosen 
rules, but how to write such rules to create the desired behavior is often not obvi¬ 
ous. This is a common problem with procedural techniques: there is only limited, 
if any, guidance on how to create new procedures or even adjust parameters of 
existing ones. Therefore, a lot of tweaking and learning by trial-and-error (“by 
experience”) is usually needed to unlock the full potential of procedural methods. 

Another interesting approach which was also originally developed to describe 
biological objects is the technique called L-systems (after the name of their origi¬ 
nal inventor, Astrid Lindenmayer). This approach is based on grammars or sets of 
recursive rules for rewriting strings of symbols. There are two types of symbols: 
terminal symbols stand for elements of something we want to represent with a 
grammar. Depending on their meaning, grammars can describe structure of trees 
and bushes, buildings and whole cities, or programming and natural languages. 
In animation, L-systems are most popular for representing plants and correspond¬ 
ing terminals are instructions to the geometric modeling system: put a leaf (or a 
branch) at a current position—we will use the symbol @ and just draw a circle, 
move current position forward by some number of units (symbol /), turn current 
direction 60 degrees around world Z-axis (symbol +), pop (symbol [) or push 
(symbol ]) current position/orientation, etc. Auxiliary nonterminal symbols (de¬ 
noted by capital letters) have only semantic rather than any direct meaning. They 
are intended to be eventually rewritten through terminals. We start from the spe¬ 
cial nonterminal start symbol S and keep applying grammar rules to the current 
string in parallel, i.e., replace all nonterminals currently present to get the new 
string, until we end up with a string containing only terminals and no more sub¬ 
stitution is therefore possible. This string of modeling instructions is then used to 
output the actual geometry. For example, a set of rules (productions) 

S A 
A -> [+B\fA 
A^ B 
B^fB 
B —> f@ 

might result in the following sequence of rewriting steps demonstrated in Fig¬ 
ure 17.27 

S 1 —* A i—> [+B\fA — [+f B \f[+B\fA — 
[+ff@\f[+fB\fB —► 
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As shown above, there are typically many different productions for the same non¬ 
terminal allowing the generation of many different objects with the same gram¬ 
mar. The choice of which rule to apply can depend on which symbols are located 
next to the one being replaced (context-sensitivity) or can be performed at ran¬ 
dom with some assigned probability for each rule (stochastic L-systems). More 
complex rules can model interaction with the environment, such as pruning to a 
particular shape, and parameters can be associated with symbols to control geo¬ 
metric commands issued. 

L-systems already capture plant topology changes with time: each interme¬ 
diate string obtained in the rewriting process can be interpreted as a “younger” 
version of the plant (see Figure 17.27). For more significant changes, different 
productions can be in effect at different times allowing the structure of the plant 
to change significantly as it grows. A young tree, for example, produces a lot of 
new branches while an older one branches only moderately. 

Very realistic plant models have been created with L-systems. However, as 
with most procedural techniques, one needs some experience to meaningfully 
apply existing L-systems, and writing new grammars to capture some desired 
effect is certainly not easy. 

17.7 Groups of Objects 

To animate multiple objects one can, of course, simply apply standard techniques 
outlined above to each of them. This works reasonably well for a moderate num¬ 
ber of independent objects whose desired motion is known in advance. However, 
in many cases, some kind of coordinated action in a dynamic environment is nec¬ 
essary. If only a few objects are involved, the animator can use an artificial intel¬ 
ligence (Al)-based system to automatically determine immediate tasks for each 
object based on some high-level goal, plan necessary motion, and execute the 
plan. Many modern games use such autonomous objects to create smart monsters 
or player’s collaborators. 

Interestingly, as the number of objects in a group grows from just a few to 
several dozens, hundreds, and thousands, individual members of a group must 
have only very limited “intelligence” in order for the group as a whole to exhibit 
what looks like coordinated goal-driven motion. It turns out that this flocking 
is emergent behavior which can arise as a result of limited interaction of group 
members with just a few of their closest neighbors (Reynolds, 1987). Flocking 
should be familiar to anyone who has observed the fascinatingly synchronized 
motion of a flock of birds or a school of fish. The technique can also be used to 
control groups of animals moving over terrain or even a human crowd. 
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Figure 17.28. (Left) Individual flock member (boid) can experience several urges of different 
importance (shown by line thickness) which have to be negotiated into a single velocity vec¬ 
tor. A boid is aware of only its limited neighborhood (circle). (Right) Boid control is commonly 
implemented as three separate modules. 


At any given moment, the motion of a member of a group, often called boid 
when applied to flocks, is the result of balancing several often contradictory ten¬ 
dencies, each of which suggests its own velocity vector (see Figure 17.28). First, 
there are external physical forces F acting on the boid, such as gravity or wind. 
New velocity due to those forces can be computed directly through Newton’s law 
as 

v neuf* CS = v °w + FAf/m. 

Second, a boid should react to global environment and to the behavior of other 
group members. Collision avoidance is one of the main results of such interac¬ 
tion. It is crucial for flocking that each group member has only limited field of 
view, and therefore is aware only of things happening within some neighborhood 
of its current position. To avoid objects in the environment, the simplest, if imper¬ 
fect, strategy is to set up a limited extent repulsive force field around each such 
object. This will create a second desired velocity vector v, also given 
by Newton’s law. Interaction with other group members can be modeled by si¬ 
multaneously applying different steering behaviors resulting in several additional 
desired velocity vectors v* t e e ® r . Moving away from neighbors to avoid crowding, 
steering towards flock mates to ensure flock cohesion and adjusting a boid’s speed 
to align with average heading of neighbors are most common. Finally, some addi¬ 
tional desired velocity vectors are usually applied to achieve needed global 
goals. These can be vectors along some path in space, following some specific 
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designated leader of the flock, or simply representing migratory urge of a flock 
member. 

Once all v new are determined, the final desired vector is negotiated based on 
priorities among them. Collision avoidance and velocity matching typically have 
higher priority. Instead of simple averaging of desired velocity vectors which can 
lead to cancellation of urges and unnatural “moving nowhere” behavior, an ac¬ 
celeration allocation strategy is used. Some fixed total amount of acceleration is 
made available for a boid and fractions of it are being given to each urge in order 
of priority. If the total available acceleration runs out, some lower priority urges 
will have less effect on the motion or be completely ignored. The hope is that 
once the currently most important task (collision avoidance in most situations) is 
accomplished, other tasks can be taken care of in near future. It is also important 
to respect some physical limitations of real objects, for example, clamping too 
high accelerations or speeds to some realistic values. Depending on the internal 
complexity of the flock member, the final stage of animation might be to turn the 
negotiated velocity vector into a specific set of parameters (bird's wing positions, 
orientation of plane model in space, leg skeleton bone configuration) used to con¬ 
trol a boid’s motion. A diagram of a system implementing flocking is shown on 
Figure 17.28 (right). 

A much simpler, but still very useful, version of group control is implemented 
by particle systems (Reeves, 1983). The number of particles in a system is typi¬ 
cally much larger than number of boids in a flock and can be in the tens or hun¬ 
dreds of thousands, or even more. Moreover, the exact number of particles can 
fluctuate during animation with new particles being born and some of the old 
ones destroyed at each step. Particles are typically completely independent from 
each other, ignoring one’s neighbors and interacting with the environment only 
by experiencing external forces and collisions with objects, not through collision 
avoidance as was the case for flocks. At each step during animation, the sys¬ 
tem first creates new particles with some initial parameters, terminates old ones, 
and then computes necessary forces and updates velocities and positions of the 
remaining particles according to Newton’s law. 

All parameters of a particle system (number of particles, particle life span, 
initial velocity, and location of a particle, etc.) are usually under the direct control 
of the animator. Prime applications of particle systems include modeling fire¬ 
works, explosions, spraying liquids, smoke and fire, or other fuzzy objects and 
phenomena with no sharp boundaries. To achieve a realistic appearance, it is im¬ 
portant to introduce some randomness to all parameters, for example, having a 
random number of particles born (and destroyed) at each step with their velocities 
generated according to some distribution. In addition to setting appropriate initial 
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Figure 17.29. After being emitted by a directional source, particles collide with an object 
and then are blown down by a local wind field once they clear the obstacle. 


parameters, controlling the motion of a particle system is commonly done by cre¬ 
ating a specific force pattern in space—blowing a particle in a new direction once 
it reaches some specific location or adding a center of attraction, for example. 
One should remember that with all their advantages, simplicity of implementa¬ 
tion and ease of control being the prime ones, particle systems typically do not 
provide the level of realism characteristic of true physics-based simulation of the 
same phenomena. 


Notes 

In this chapter we have concentrated on techniques used in 3D animation. There 
also exist a rich set of algorithms to help with 2D animation production and post¬ 
processing of images created by computer graphics rendering systems. These 
include techniques for cleaning up scanned-in artist drawings, feature extrac¬ 
tion, automatic 2D in-betweening, colorization, image warping, enhancement and 
compositing, and many others. 

One of the most significant developments in the area of computer animation 
has been the increasing power and availability of sophisticated animation systems. 
While different in their specific set of features, internal structure, details of user 
interface, and price, most such systems include extensive support not only for 
animation, but also for modeling and rendering turning them into complete pro¬ 
duction platforms. It is also common to use these systems to create still images. 
For example, many images for figures in this section were produced using Maya 
software generously donated by Alias. 
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Large-scale animation production is an extremely complex process which typ¬ 
ically involves a combined effort by dozens of people with different backgrounds 
spread across many departments or even companies. To better coordinate this ac¬ 
tivity, a certain production pipeline is established which starts with a story and 
character sketches, proceeds to record necessary sound, build models, and rig 
characters for animation. Once actual animation commences, it is common to 
go back and revise the original designs, models, and rigs to fix any discovered 
motion and appearance problems. Setting up lighting and material properties is 
then necessary, after which it is possible to start rendering. In most sufficiently 
complex projects, extensive postprocessing and compositing stages bring together 
images from different sources and finalize the product. 

We conclude this chapter by reminding the reader that in the field of computer 
animation any technical sophistication is secondary to a good story, expressive 
characters, and other artistic factors, most of which are hard or simply impossible 
to quantify. It is safe to say that Snow White and her seven dwarfs will always 
share the screen with green ogres and donkeys, and most of the audience will be 
much more interested in the characters and the story rather than in which, if any, 
computers (and in what exact way) helped to create either of them. 




Using Graphics Hardware 


Throughout most of this book, the focus has been on the fundamentals underlying 
computer graphics rather than on implementation details. This chapter takes a 
slightly different route and blends the details of using graphics hardware with the 
practical issues associated with programming that hardware. 

This chapter, however, is not written to teach you OpenGL,™ other graphics 
APIs, or even the nitty gritty specifics of graphics hardware programming. The 
purpose of this chapter is to introduce the basic concepts and thought processes 
that are necessary when writing programs that use graphics hardware. 


18.1 What Is Graphics Hardware 

Graphics hardware describes the hardware components necessary to quickly ren¬ 
der 3D objects as pixels on your computer’s screen using specialized rasterization- 
based hardware architectures. The use of this term is meant to elicit a sense of 
the physical components necessary for performing these computations. In other 
words, we’re talking about the chipsets, transistors, buses, and processors found 
on many current video cards. As we will see in this chapter, current graphics 
hardware is very good at processing descriptions of 3D objects and transforming 
them into the colored pixels that fill your monitor. 

One thing has been certain with graphics hardware: it changes very quickly 
with new extensions and features being added continually! One explanation for 
the fast pace is the video game industry and its economic momentum. Essentially 
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Figure 18.1. The basic graphics hardware pipeline consists of stages that transform 3D 
data into 2D screen objects ready for rasterizing and coloring by the pixel processing stages. 


Real-Time Graphics: By 

real-time graphics, we 
generally mean that the 
graphics-related compu¬ 
tations are being carried 
out fast enough that the 
results can be viewed 
immediately. Being able 
to conduct operations at 
60Hz is considered real 
time. Once the time to 
refresh the display (frame 
rate) drops below 15Hz, 
the speed is considered 
more interactive than it is 
real-time, but this distinc¬ 
tion is not critical. Because 
the computations need to 
be fast, the equations used 
to render the graphics are 
often approximations to 
what could be done if more 
time were available. 


what this means is that each new graphics card provides better performance and 
processing capabilities. As a result, graphics hardware is being used for tasks 
that support a much richer use of 3D graphics. For instance, researchers are per¬ 
forming computation on graphics hardware to perform ray-tracing (Purcell et al., 
2002) and even solve the Navier-Stokes equations to simulate fluid flow (Harris, 
2004). 

Most graphics hardware has been built to perform a set of fixed operations 
organized as a pipeline designed to push vertices and pixels through different 
stages. The fixed functionality of the pipeline ensures that basic coloring, lighting, 
and texturing can occur very quickly—often referred to as real-time graphics. 

Figure 18.1 illustrates the real-time graphics pipeline. The important things 
to note about the pipeline follow: 

• The user program, or application, supplies the data to the graphics hardware 
in the form of primitives, such as points, lines, or polygons describing the 
3D geometry. Images or bitmaps are also supplied for use in texturing 
surfaces. 

• Geometric primitives are processed on a per-vertex basis and are trans¬ 
formed from 3D coordinates to 2D screen triangles. 

• Screen objects are passed to the pixel processors, rasterized, and then col¬ 
ored on a per-pixel basis before being output to the frame buffer, and even¬ 
tually to the monitor. 


18.2 Describing Geometry for the Hardware 


As a graphics programmer, you need to be concerned with how the data associ¬ 
ated with your 3D objects is transferred onto the memory cache of the graphics 
hardware. Unfortunately (or maybe fortunately), as a programmer you don’t have 
complete control over this process. There are a variety of ways to place your 
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data on the graphics hardware, and each has its own advantages which will be 
discussed in this section. Any of the APIs you might use to program your video 
card will provide different methods to load data onto the graphics hardware mem¬ 
ory. The examples that follow are presented in pseudocode that is based loosely 
on the C function syntax of OpenGL,™ but semantically the examples should be 
applicable to other graphics APIs. 

Most graphics hardware work with specific sets of geometric primitives. The 
primitive types leverage primitive complexity for processing speed on the graph¬ 
ics hardware. Simpler primitives can be processed very fast. The caveat is that 
the primitive types need to be general purpose so as to model a wide range of 
geometry from very simple to very complex. On typical graphics hardware, the 
primitive types are limited to one or more of the following: 

• points— single vertices used to represent points or particle systems; 


Primitives: The three 

primitives (points, lines, 
and polygons) are the only 
primitives available! Even 
when creating spline-based 
surfaces, such as NURBs, 
the surfaces are tessellated 
into triangle primitives by 
the graphics hardware. 


• lines— pairs of vertices used to represent lines, silhouettes, or edge¬ 
highlighting; 

• polygons— triangles, triangle strips, indexed triangles, indexed triangle 
strips, quadrilaterals, general convex polygons, etc., used for describing tri¬ 
angle meshes, geometric surfaces, and other solid objects, such as spheres, 
cones, cubes, or cylinders. 

These three primitives form the basic building blocks for most geometry you 
will define. (An example of a triangle mesh is shown in Figure 18.2.) Using these 
primitives, you can build descriptions of your geometry using one of the graphics 
APIs and send the geometry to the graphics hardware for rendering. For instance, 


Point Rendering: Point 
and line primitives may ini¬ 
tially appear to be lim¬ 
ited in use, but researchers 
have used points to ren¬ 
der very complex geome¬ 
try (Rusinkiewicz & Levoy, 
2000; Dachsbacher et al., 
2003). 



Figure 18.2. How your geometry is organized will affect the performance of your applica¬ 
tion. This wireframe depiction of the Little Cottonwood Canyon terrain dataset shows tens of 
thousands of triangles organized in a triangle mesh running at real-time rates. The image is 
rendered using the VTerrain Project terrain system courtesy of Ben Discoe. 
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to transfer the description of a line to the graphics hardware, we might use the 
following: 


beginLine(); 

vertex) xl, yl, zl ); 
vertex) x2, y2, z2 ); 
endLine(); 



gle strip composed of five 
vertices defining three tri¬ 
angles. 


In this example, two things occur. First, one of the primitive types is declared and 
made active by the beginLine ( ) function call. The line primitive is then made 
inactive by the endLine ( ) function call. Second, all vertices declared between 
these two functions are copied directly to the graphics card for processing with 
the vertex function calls. 

A second example creates a set of triangles grouped together in a strip (refer 
to Figure 18.3); we could use the following code: 


beginTriangleStrip(); 


vertex( 

xO, 

yO, 

zO 

vertex( 

xl. 

yi. 

z 1 

vertex( 

x2, 

y2. 

z2 

vertex( 

x3, 

y3. 

z3 

vertex( 

x4. 

y4. 

z4 


endTriangleStrip(); 


In this example, the primitive type, TriangleStrip, is made active and the set 
of vertices that define the triangle strip are copied to the graphics card memory for 
processing. Note that ordering does matter when describing geometry. In the tri¬ 
angle strip example, connectivity between adjacent triangles is embedded within 
the ordering of the vertices. Triangle iO is constructed from vertices (uO, vl, v2), 
triangle tl from vertices (id, v3, v2), and triangle t2 from vertices ( v2, v3, vA). 

The key point to learn from these simple examples is that geometry is defined 
for rendering on the graphics hardware using a primitive type along with a set of 
vertices. The previous examples are simple and push the vertices directly onto 
the graphics hardware. However, in practice, you will need to make conscious 
decisions about how you will push your data to the graphics hardware. These 
issues will be discussed shortly. 

As geometry is passed to the graphics hardware, additional data can be spec¬ 
ified for each vertex. This extra data is useful for defining state attributes, that 
might represent the color of the vertex, the normal direction at the vertex, texture 
coordinates at the vertex, or other per-vertex data. For instance, to set the color 
and normal state parameters at each vertex of a triangle strip, we might use the 
following code: 
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beginTriangleStrip(); 


color( 

rO, gO, bO 

); normal( 

nOx, 

nOy, 

nOz 

) 

vertex( 

xO, yO, zO 

) ; 





color) 

r1, gl, bl 

) ; normal( 

nix. 

nly. 

nl z 

) 

vertex( 

xl, yl, zl 

) ; 





color( 

r2, g2, b2 

); normal( 

n2x. 

n2y. 

n2 z 

) 

vertex( 

x2, y2, z2 

) ; 





color( 

r3, g3, b3 

); normal( 

n3x. 

n3y. 

n3 z 

) 

vertex( 

x3, y3, z3 

) ; 





color) 

r4, g4, b4 

) ; normal( 

n4x. 

n4y. 

n4 z 

) 

vertex( 

x4, y4, z4 

); 






endTriangleStrip(); 

Here, the color and normal direction at each vertex are specified just prior to the 
vertex being defined. Each vertex in this example has a unique color and normal 
direction. The color function sets the active color state using a RGB 3-tuple. 
The normal direction state at each vertex is set by the normal function. Both the 
color and normal function affect the current rendering state on the graphics 
hardware. Any vertices defined after these state attributes are set will be bound 
with those state attributes. 

This is a good moment to mention that the graphics hardware maintains a 
fairly elaborate set of state parameters that determine how vertices and other com¬ 
ponents are rendered. Some state is bound to vertices, such as color, normal direc¬ 
tion, and texture coordinates, while another state may affect pixel level rendering. 
The graphics state at any particular moment describes a large set of internal hard¬ 
ware parameters. This aspect of graphics hardware is important to consider when 
you write 3D applications. As you might suspect, making frequent changes to the 
graphics state affects performance at least to some extent. However, attempting 
to minimize graphics state changes is only one of many areas where thoughtful 
programming should be applied. You should attempt to minimize state changes 
when you can, but it is unlikely that you can group all of your geometry to com¬ 
pletely reduce state context switches. One data structure that can help minimize 
state changes, especially on static scenes, is the scene graph data structure. Prior 
to rendering any geometry, the scene graph can re-organize the geometry and as¬ 
sociated graphics state in an attempt to minimize state changes. Scene graphs are 
described in Chapter 12. 

color( r, g, b ); 
normal( nx, ny, nz ); 
beginTriangleStrip(); 
vertex( xO, yO, zO ); 
vertex) xl, yl, zl ); 
vertex) x2, y2, z2 ); 
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vertex( x3, y3, z3 ) ; 
vertex( x4, y4, z4 ); 
endTriangleStrip(); 

All vertices in this TriangleStrip have the same color and normal direction, 
so these state parameters can be set prior to defining the vertices. This minimizes 
both function call overhead and changes to the internal graphics state. 

Many things can affect the performance of a graphics program, but one of the 
potentially large contributors to performance (or lack thereof) is how your geome¬ 
try is organized and whether it is stored in the memory cache of the graphics card. 
In the pseudocode examples provided so far, geometry has been pushed onto the 
graphics hardware in what is often called immediate mode rendering. As vertices 
are defined, they are sent directly to the graphics hardware. The primary disad¬ 
vantage of immediate mode rendering is that the geometry is sent to the graphics 
hardware each iteration of your application. If your geometry is static (i.e., it 
doesn't change), then there is no real need to resend the data each time you re¬ 
draw a frame. In these and other circumstances, it is more desirable to store the 
geometry in the graphics card’s memory. 

The graphics hardware in your computer is connected to the rest of the system 
via a data bus, such as the PCI, AGP, or PCI-Express buses. When you send data 
to the graphics hardware, it is sent by the CPU on your machine across one of 
these buses, eventually being stored in the memory on your graphics hardware. If 
you have very large triangle meshes representing complex geometry, passing all 
this data across the bus can end up resulting in a large hit to performance. This 
is especially true if the geometry is being rendered in immediate mode, as the 
previous examples have illustrated. 

There are various ways to organize geometry; some can help reduce the over¬ 
all bandwidth needed for transmitting the geometry across the graphics bus. Some 
possible organization approaches include: 

• triangles. Triangles are specified with three vertices. A triangle mesh 
created in this manner requires that each triangle in the mesh be defined 
separately with many vertices potentially duplicated. For a triangle mesh 
containing rn triangles, 3 m vertices will be sent to the graphics hardware. 

• triangle strips. Triangles are organized in strips; the first three vertices 
specify the first triangle in the strip and each additional vertex adds a tri¬ 
angle. If you create a triangle mesh with m triangles organized as a single 
triangle strip, you send three vertices to the graphics hardware for the first 
triangle followed by a single vertex for each additional triangle in the strip 
for a total of m + 2 vertices. 
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• indexed triangles. Triangle vertices are arranged as an array of vertices 
with a separate array defining the triangles using indices into the vertex 
array. Vertex arrays are sent to the graphics card with very few function 
calls. 

• indexed triangle strips. Similar to indexed triangles, triangle vertices are 
stored in a vertex array. However, triangles are organized in strips with 
the index array defining the strip layout. This is the most compact of the 
organizational structures for defining triangle meshes as it combines the 
benefits of triangles strips with the compactness of vertex arrays. 

Of the different organizational structures, the use of vertex arrays, either through 
indexed triangles or indexed triangle strips, provides a good option for increasing 
the performance of your application. The tight encapsulation of the organization 
means that many fewer function calls need to be made as well. Once the vertices 
and indices are stored in an array, only a few function calls need to be made to 
transfer the data to the graphics hardware, whereas with the pseudocode examples 
illustrated previously, a function is called for each vertex. 

At this point, you may be wondering how the graphics state such as colors, 
normals, or texture coordinates are defined when vertex arrays are used. In the 
immediate-mode rendering examples earlier in the chapter, interleaving the graph¬ 
ics state with the associated vertices is obvious based on the order of the function 
calls. When vertex arrays are used, graphics state can either be interleaved in the 
vertex array or specified in separate arrays that are passed to the graphics hard¬ 
ware. 

Even if the geometry is organized efficiently when it is sent to the graphics 
hardware, you can achieve higher performance gains if you can store your geom¬ 
etry in the graphics hardware’s memory for the duration of your application. A 
somewhat unfortunate fact about current graphics hardware is that many of the 
specifications describing the layout of the graphics hardware memory and cache 
structure are often not widely publicized. Fortunately though, there are ways us¬ 
ing graphics APIs that allow programmers to place geometry into the graphics 
hardware memory resulting in applications that run faster. 

Two commonly used methods to store geometry and graphics state in the 
graphics hardware cache involve creating display lists or vertex buffer objects. 

Display lists compile a compact list representation of the geometry and the 
state associated with the geometry and store the list in the memory on the graphics 
hardware. The benefits of display lists are that they are general purpose and good 
at storing a static geometric representation plus associated graphics state on the 
hardware. They do not work well at all for continuously changing geometry and 
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Optimal Organization: 

Much research effort has 
gone into looking at ways 
to optimize triangle meshes 
for maximum performance 
on graphics hardware. A 
good place to start read¬ 
ing if you want to delve fur¬ 
ther into understanding how 
triangle mesh organization 
affects performance is the 
SIGGRAPH 1999 paper on 
the optimization of mesh lo¬ 
cality (Hoppe, 1999). 


graphics state, since the display list must be recompiled and then stored again 
in the graphics hardware memory for every iteration in which the display list 
changes. 

displaylD = createDisplayList(); 
color( r, g, b ); 
normal( nx, ny, nz ); 
beginTriangleStrip(); 
vertex( xO, yO, zO ); 
vertex) xl, yl, zl ); 

vertex( xN, yN, zN ); 
endTriangleStrip(); 
endDisplayList(); 

In the above example, a display list is created that contains the definition of a tri¬ 
angle strip with its associated color and normal information. The commands be¬ 
tween the createDisplayList and endDisplayList function calls pro¬ 
vide the elements that define the display list. Display lists are most often created 
during an initialization phase of an application. After the display list is created, it 
is stored in the memory of the graphics hardware and can be referenced for later 
use by the identifier assigned to the list. 

// draw the display list created earlier 
drawDisplayList(displaylD); 

When it is time to draw the contents of the display list, a single function call will 
instruct the graphics hardware to access the memory indexed through the display 
list identifier and display the contents. 

A second method to store geometry on the graphics hardware for the duration 
of your application is through vertex buffer objects (VBOs). VBOs are specialized 
buffers that reside in high-performance memory on the graphics hardware and 
store vertex arrays and associated graphics state. They can also provide a mapping 
from your application to the memory on the graphics hardware to allow for fast 
access and updating to the contents of the VBO. 

The chief advantage of VBOs is that they provide a mapping into the graphics 
hardware memory. With VBOs, geometry can be modified during an application 
with a minimal loss of performance as compared with using immediate mode 
rendering or display lists. This is extremely useful if portions of your geometry 
change during each iteration of your application or if the indices used to organize 
your geometry change. 

VBOs are created in much the same way indexed triangles and indexed trian¬ 
gle strips are built. A buffer object is first created on the graphics card to make 


18.3. Processing Geometry into Pixels 


453 


room for the vertex array containing the vertices of the triangle mesh. Next, the 
vertex array and index array are copied over to the graphics hardware. When it 
is time to render the geometry, the vertex buffer object identifier can be used to 
instruct the graphics hardware to draw your geometry. If you are already using 
vertex arrays in your application, modifying your code to use VBOs should likely 
require a minimal change. 


18.3 Processing Geometry into Pixels 


After the geometry has been placed in the graphics hardware memory, each ver¬ 
tex must be lit as well as transformed into screen coordinates during the geometry 
processing stage. In the fixed-function graphics pipeline illustrated in Figure 18.1, 
vertices are transformed from a model coordinate system to a screen coordinate 
frame of reference. This process and the matrices involved are described in Chap¬ 
ters 7 and 8. The modelview and projection matrices needed for this transfor¬ 
mation are defined using functions provided with the graphics API you decide to 
use. 

Lighting is calculated on a per-vertex basis. Depending on the global shading 
parameters, the triangle face will either have a flat-shaded look or the face color 
will be diffusely shaded (Gouraud shading) by linearly interpolating the color at 
each triangle vertex across the face of the triangle. The latter method produces 
a much smoother appearance. The color at each vertex is computed based on 
the assigned material properties, the lights in the scene, and various lighting 
parameters. 

The lighting model in the fixed-function graphics pipeline is good for fast 
lighting of vertices; we make a tradeoff for increased speed over accurate illu¬ 
mination. As a result, Phong shaded surfaces are not supported with this fixed- 
function framework. 

In particular, the diffuse shading algorithm built into the graphics hardware 
often fails to compute the appropriate illumination since the lighting is only being 
calculated at each vertex. For example, when the distance to the light source is 
small, as compared with the size of the face being shaded, the illumination on 
the face will be incorrect. Figure 18.4 illustrates this situation. The center of 
the triangle will not be illuminated brightly despite being very close to the light 
source, since the lighting on the vertices, which are far from the light source, are 
used to interpolate the shading across the face. 

With the fixed-function pipeline, this issue can only be remedied by increasing 
the tessellation of the geometry. This solution works but is of limited use in real- 



Figure 18.4. The distance 
to the light source is small 
relative to the size of the tri¬ 
angle. 
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time graphics as the added geometry required for more accurate illumination can 
result in slower rendering. 

However, with current hardware, the problem of obtaining better approxima¬ 
tions for illumination can be solved without necessarily increasing the geometric 
complexity of the objects. The solution involves replacing the fixed-function rou¬ 
tines embedded within the graphics hardware with your own programs. These 
small programs run on the graphics hardware and perform a part of the geometry 
processing and pixel-processing stages of the graphics pipeline. 


18.3.1 Programming the Pipeline 


Definition: Fragment is a 
term that describes the in¬ 
formation associated with 
a pixel prior to being pro¬ 
cessed by the graphics 
hardware. This definition 
includes much of the data 
that might be used to cal¬ 
culate the color of the pixel, 
such as the pixel's scene 
depth, texture coordinates, 
or stencil information. 


Fairly recent changes to the organization of consumer graphics hardware has gen¬ 
erated a substantial buzz from game developers, graphics researchers, and many 
others. It is quite likely that you have heard about GPU programming, graph¬ 
ics hardware programming, or even shader programming. These terms and the 
changes in consumer hardware that have spawned them primarily have to do with 
how the graphics hardware rendering pipeline can now be programmed. 

Specifically, the changes have opened up two specific aspects of the graphics 
hardware pipeline. Programmers now have the ability to modify how the hard¬ 
ware processes vertices and shades pixels by writing vertex shaders and frag¬ 
ment shaders (also sometimes referred to as vertex programs or fragment pro¬ 
grams). Vertex shaders are programs that perform the vertex and normal trans¬ 
formations, texture coordinate generation, and per-vertex lighting computations 
normally computed in the geometry processing stage. Fragment shaders are pro¬ 
grams that perform the computations in the pixel processing stage of the graphics 
pipeline and determine exactly how each pixel is shaded, how textures are ap¬ 
plied, and if a pixel should be drawn or not. These small shader programs are 
sent to the graphics hardware from the user program (see Figure 18.5), but they 
are executed on the graphics hardware. What this programmability means for 


User 

primitives 


Geometry 

2D screen 
coordinates 

Pixel 

Program 



Processing 


Processing 


vertex program 

- pixel shader 


Figure 18.5. The programmable graphics hardware pipeline. The user program supplies 
primitives, vertex programs, and fragment programs to the hardware. 
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you is that you essentially have a multi-processor machine. This turns out to be 
a good way to think about your graphics hardware, since it means that you may 
be able to use the graphics hardware processor to relieve the load on the CPU in 
some of your applications. The graphics hardware processors are often referred 
to as GPU s. GPU stands for graphics processing unit and highlights the fact 
that graphics hardware components now contain a separate processor dedicated 
to graphics-related computations. 

Interestingly, modern GPUs contain more transistors than modern CPUs. For 
the time being, GPUs are utilizing most of these transistors for computations and 
less for memory or cache management operations. 

However, this will not always be the case as graphics hardware continues to 
advance. And just because the computations are geared towards 3D graphics, 
it does not mean that you cannot perform computations unrelated to computer 
graphics on the GPU. The manner in which the GPU is programmed is differ¬ 
ent from your general purpose CPU and will require a slightly modified way of 
thinking about how to solve problems and program the graphics hardware. 

The GPU is a stream processor that excels at 3D vector operations such as 
vector multiplication, vector addition, dot products, and other operations neces¬ 
sary for basic lighting of surfaces and texture mapping. As stream processors, 
both the vertex and fragment processing components include the ability to pro¬ 
cess multiple primitives at the same time. In this regard, the GPU acts as a SIMD 
(Single Instruction, Multiple Data) processor, and in certain hardware implemen¬ 
tations of the fragment processor, up to 16 pixels can be processed at a time. 
When you write programs for these processing components, it will be helpful, at 
least conceptually, to think of the computations being performed concurrently on 
your data. In other words, the vertex shader program will run for all vertices at 
the same time. The vertex computations will then be followed by a stage in which 
your fragment shader program will execute simultaneously on all fragments. It 
is important to note that while the computations on vertices or fragments occur 
concurrently, the staging of the pipeline components still occur in the same order. 

The manner in which vertex and fragment shaders work is simple. You write 
a vertex shader program and a fragment shader program and send it to the graph¬ 
ics hardware. These programs can be used on specific geometry, and when your 
geometry is processed, the vertex shader is used to transform and light the ver¬ 
tices, while the fragment shader performs the final shading of the geometry on a 
per-pixel basis. Just as you can texture map different images onto different pieces 
of geometry, you can also write different shader programs to act upon different 
objects in your application. Shader programs are a part of the graphics state so 
you do need to be concerned with how your shader programs might get swapped 
in and out based on the geometry being rendered. 


Historical: Programming 
the pipeline is not entirely 
new. One of the first 
introductions of a graphics 
hardware architecture 
designed for program¬ 
ming flexibility were the 
PixelFlow architectures 
and shading languages 
from UNC (Molnar et 
al., 1992; Lastra et al., 
1995; Olano & Lastra, 
1998). Additional efforts 
to provide custom shading 
techniques have included 
shade trees (Cook, 
1984), RenderMan (Pixar, 
2000), accelerated multi¬ 
pass rendering using 
OpenGL IM (Peercy et al., 
2000), and other real-time 
shading languages (Proud- 
foot et al., 2001; McCool et 
al., 2004). 
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The details tend to be a bit more complicated, however. Vertex shaders usually 
perform two basic actions: set the color at the vertex and transform the vertex into 
screen coordinates by multiplying the vertex by the modelview and projection 
matrices. The perspective divide and clipping steps are not performed in a vertex 
program. Vertex shaders are also often used to set the stage for a fragment shader. 
In particular, you may have vertex attributes, such as texture coordinates or other 
application-dependent data, that the vertex shader calculates or modifies and then 
sends to the fragment processing stage for use in your fragment shader. It may 
seem strange at first, but vertex shaders can be used to manipulate the positions 
of the vertices. This is often useful for generating simulated ocean wave motion 
entirely on the GPU. 

In a fragment shader, it is required that the program outputs the fragment 
color. This may involve looking up texture values and combining them in some 
manner with values obtained by performing a lighting calculation at each pixel; 
or, it may involve killing the fragment from being drawn entirely. Because op¬ 
erations in the fragment shader operate at the fragment level, the real power of 
the programmable graphics hardware is in the fragment shader. This added pro¬ 
cessing power represents one of the key differences between the fixed function 
pipeline and the programmable pipeline. In the fixed pipeline, fragment process¬ 
ing used illumination values interpolated between the vertices of the triangle to 
compute the fragment color. With the programmable pipeline, the color at each 
fragment can be computed independently. For instance, in the example situation 
posed in Figure 18.4, Gouraud shading of a triangle face fails to produce a reason¬ 
able solution because lighting only occurs at the vertices which are farther away 
from the light than the center of the triangle. In a fragment shader, the lighting 
equation can be evaluated at each fragment, rather than at each vertex, resulting 
in a more accurate rendering of the face. 

18.3.2 Basic Execution Model 

When writing vertex or fragment shaders, there are a few important things to un¬ 
derstand in terms of how vertex and fragment programs execute and access data 
on the GPU. Because these programs run entirely on the GPU, the first details 
you will need to figure out are which data your shaders will use and how to get 
that data to them. There are several characteristics associated with the data types 
used in shader programs. The following terms, which come primarily from the 
OpenGL™ Shading Language framework, are used to describe the conceptual 
aspects of these data characteristics. The concepts are the same across different 
shading language frameworks. In the shaders you write, variables are character¬ 
ized using one of the following terms: 


18.3. Processing Geometry into Pixels 


457 


• attributes. Attribute variables represent data that changes frequently, often 
on a per-vertex basis. Attribute variables are often tied to the changing 
graphics state associated with each vertex. For instance, normal vectors or 
texture coordinates are considered to be attribute data since they are part of 
the graphics state associated with each vertex. 

• uniforms. Uniform variables represent data that cannot change during the 
execution of a shader program. However, uniform variables can be mod¬ 
ified by your application between executions of a shader. This provides 
another way for your application to communicate data to a shader. Uniform 
data often represent the graphics state associated with an application. For 
instance, the modelview and projection matrices can be accessed through 
uniform variables. Information about light sources in your application can 
also be obtained through uniform variables. In these examples, the data 
does not change while the shader is executing, but could (e.g., the light 
could move) prior to the next iteration of the application. 

• varying. Varying data is used to pass data between a vertex shader and 
a fragment shader. The reason the data is considered varying is because 
it is written by vertex shaders on a per-vertex basis, but read by fragment 
shaders as value interpolated across the face of the primitive between neigh¬ 
boring vertices. 

Variables defined using one of these three characteristics can either be built-in 
variables or user-defined variables. In addition to accessing the built-in graphics 
state, attribute and uniform variables are one of the ways to communicate user- 
defined data to your vertex and fragment programs. Varying data is the only means 
to pass data from a vertex shader to a fragment shader. Figure 18.6 illustrates the 
basic execution of the vertex and fragment processors in terms of the inputs and 
outputs used by the shaders. 

Another way to pass data to vertex and fragment shaders is by using texture 
maps as sources and sinks of data. This may come as a surprise if you have been 
thinking of texture maps solely as images that are applied to the outside surface of 
geometry. The reason texture maps are important is because they give you access 
to the memory on the graphics hardware. When you write applications that run 
on the CPU, you control the memory your application requires and have direct 
access to it when necessary. On graphics hardware, memory is not accessed in 
the same manner. In fact, you are not directly able to allocate and deallocate gen¬ 
eral purpose memory chunks, and this particular aspect usually requires a slight 
change in thinking. 
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Figure 1 8.6. The execution model for shader programs. Input, such as per-vertex attributes, 
graphics state-related uniform variables, varying data, and texture maps are provided to 
vertex and fragment programs within the shader processor. Shaders output special variables 
used in later parts of the graphics pipeline. 


Note: The shader lan¬ 
guage examples used in 
this chapter are presented 
using GLSL (OpenGL™ 
Shading Language). This 
language was chosen since 
it is being developed by 
the OpenGL™ Architec¬ 
ture Review Board and 
will likely become a stan¬ 
dard shading language for 
OpenGL™ with the release 
of OpenGL™ 2.0. As 
of this writing, GLSL can 
be used on most mod¬ 
ern graphics cards with up¬ 
dated graphics hardware 
drivers. 


Texture maps on graphics hardware, however, can be created, deleted, and 
controlled through the graphics API you use. In other words, for general data 
used by your shader, you will create texture maps that contain that data and then 
use texture access functions to look up the data in the texture map. Technically, 
textures can be accessed by both vertex and fragment shaders. However, in prac¬ 
tice, texture lookups from the vertex shader are not currently supported on all 
graphics cards. An example that utilizes a texture map as a data source is bump 
mapping. Bump mapping uses a normal map which defines how the normal vec¬ 
tors change across a triangle face. A bump mapping fragment shader would look 
up the normal vector in the normal map “texture data” and use it in the shading 
calculations at that particular fragment. 

You need to be concerned about the types of data you put into your tex¬ 
ture maps. Not all numerical data types are well supported and only recently 
has graphics hardware included floating point textures with 16-bit components. 
Moreover, none of the computation being performed on your GPU is done with 
double-precision math! If numerical precision is important for your application, 
you will need to think through these issues very carefully to determine if using 
the graphics hardware for computation is useful. 

So what do these shader programs look like? One way to write vertex and 
fragment shaders is through assembly language instructions. For instance, per¬ 
forming a matrix multiplication in shader assembly language looks something 
like this: 

DP4 p[0].x, M[0], v[0]; 

DP4 p[0].y, M[1], v[0]; 

DP4 p[0].z, M[2], v[0]; 

DP4 p[0].w, M[3], v[0]; 
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In this example, the DP4 instruction is a 4-component dot product function. It 
stores the result of the dot product in the first register and performs the dot 
product between the last two registers. In shader programming, registers hold 
4-components corresponding to the x, y, z, and w components of a homogeneous 
coordinate, or the r, g, b, and a components of a RGBA tuple. So, in this example, 
a simple matrix multiplication, 

p = Mv 

is computed by four DP4 instructions. Each instruction computes one element of 
the final result. 

Fortunately though, you are not forced to program in assembly language. The 
good news is that higher-level languages are available to write vertex and frag¬ 
ment shaders. NVIDIA’s Cg, the OpenGL™ Shading Language (GLSL), and 
Microsoft’s High Level Shading Language (HLSL) all provide similar interfaces 
to the programmable aspects of graphics hardware. Using the notation of GLSL, 
the same matrix multiplication performed above looks like this: 

p = M * v; 

where p and v are vertex data types and M is a matrix data type. As evidenced 
here, one advantage of using a higher-level language over assembly language is 
that various data types are available to the programmer. In all of these languages, 
there are built-in data types for storing vectors and matrices, as well as arrays and 
constructs for creating structures. Many different functions are also built in to 
these languages to help compute trigonometric values (sin, cos, etc...), minimum 
and maximum values, exponential functions (log2, sqrt, pow, etc...), and other 
math or geometric-based functions. 


18.3.3 Vertex Shader Example 

Vertex shaders give you control over how your vertices are lit and transformed. 
They are also used to set the stage for fragment shaders. An interesting aspect to 
vertex shaders is that you still are able to use geometry-caching mechanisms, such 
as display lists or VBOs, and thus, benefit from their performance gains while us¬ 
ing vertex shaders to do computation on the GPU. For instance, if the vertices 
represent particles and you can model the movement of the particles using a ver¬ 
tex shader, you have nearly eliminated the CPU from these computations. Any 
bottleneck in performance that may have occurred due to data being passed be¬ 
tween the CPU and the GPU will be minimized. Prior to the introduction of vertex 
shaders, the computation of the particle movement would have been performed 
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on the CPU and each vertex would have been re-sent to the graphics hardware 
on each iteration of your application. The ability to perform computations on the 
vertices already stored in the graphics hardware memory is a big performance 
win. 

One of the simplest vertex shaders transforms a vertex into clip coordinates 
and assigns the front-facing color to the color attribute associated with the vertex. 

void main(void) 

{ 

gl_Position = gl_ModelViewProjectionMatrix * 
gl_Vertex; 

gl_FrontColor = gl_Color; 

} 

In this example, gl_ModelViewProjectionMatrix is a built-in uniform 
variable supplied by the GLSL run-time environment. The variables gl_Vertex 
and gl_Color are built-in vertex attributes; the special output variables, 
gl_Position and gl_FrontColor are used by the vertex shader to set the 
transformed position and the vertex color. 

A more interesting vertex shader that implements the surface- shading equa¬ 
tions developed in Chapter 10 illustrates the effect of per-vertex shading using the 
Phong shading algorithm. 

void main(void) 

{ 

vec4 v = gl_ModelViewMatrix * gl_Vertex; 
vec3 n = normalize(gl_NormalMatrix * gl_Normal); 
vec3 1 = normalize(gl_LightSource[0].position - v); 
vec3 h = normalize(l - normalize(v)); 

float p = 16; 

vec4 cr = gl_FrontMaterial.diffuse; 
vec4 cl = gl_LightSource[0].diffuse; 
vec4 ca = vec4(0.2, 0.2, 0.2, 1.0); 

vec4 color; 
if (dot(h,n) > 0) 

color = cr * (ca + cl * max(0,dot(n,1))) + 
cl * pow(dot(h,n), p); 

else 

color = cr * (ca + cl * max(0,dot(n,1))); 

gl_FrontColor = color; 
gl_Position = ftransform(); 


} 
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From the code presented in this shader, you should be able to gain a sense of 
shader programming and how it resembles C-style programming. Several things 
are happening with this shader. First, we create a set of variables to hold the 
vectors necessary for computing Phong shading: v, n, 1, and h. Note that the 
computation in the vertex shader is performed in eye-space. This is done for a va¬ 
riety of reasons, but one reason is that the light-source positions accessible within 
a shader have already been transformed into the eye coordinate frame. When you 
create shaders, the coordinate system that you decide to use will likely depend 
on the types of computations being performed; this is an important factor to con¬ 
sider. Also, note the use of built-in functions and data structures in the example. 
In particular, there are several functions used in this shader: normalize, dot, 
max, pow, and ftransform. These functions are provided with the shader 
language. Additionally, the graphics state associated with materials and light¬ 
ing can be accessed through built-in uniform variables: gl_FrontMaterial 
and gl_LightSource [ 0 ]. The diffuse component of the material and light 
is accessed through the diffuse member of these variables. The color at the 
vertex is computed using Equation (10.8) and then stored in the special output 
variable gl_FrontColor. The vertex position is transformed using the func- 




Figure 18.7. Each sphere is rendered using only a vertex shader that computes Phong 
shading. Because the computation is being performed on a per-vertex basis, the Phong 
highlight only begins to appear accurate after the amount of geometry used to model the 
sphere is increased drastically. (See also Plate VIII.) 
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tion ftransform which is a convenience function that performs the multipli¬ 
cation with the modelview and projection matrices. Figure 18.7 shows the results 
from running this vertex shader with differently tessellated spheres. Because the 
computations are performed on a per-vertex basis, a large amount of geometry is 
required to produce a Phong highlight on the sphere that appears correct. 


18.3.4 Fragment Shader Example 

Fragment shaders are written in a manner very similar to vertex shaders, and to 
emphasize this, Equation (10.8) from will be implemented with a fragment shader. 
In order to do this, we first will need to write a vertex shader to set the stage for 
the fragment shader. 

The vertex shader required for this example is fairly simple, but introduces the 
use of varying variables to communicate data to the fragment shader. 

varying vec4 v; 
varying vec3 n; 

void main(void) 

{ 

v = gl_ModelViewMatrix * gl_Vertex; 
n = normalize(gl_NormalMatrix * gl_Normal); 

gl_Position = ftransform(); 

} 

Recall that varying variables will be set on a per-vertex basis by a vertex shader, 
but when they are accessed in a fragment shader, the values will vary (i.e., be 
interpolated) across the triangle, or geometric primitive. In this case, the vertex 
position in eye-space v and the normal at the vertex n are calculated at each 
vertex. The final computation performed by the vertex shader is to transform the 
vertex into clip coordinates since the fragment shader will compute the lighting 
at each fragment. It is not necessary to set the front-facing color in this vertex 
shader. 

The fragment shader program computes the lighting at each fragment using 
the Phong shading model. 

varying vec4 v; 
varying vec3 n; 

void main(void) 

{ 
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vec3 1 = normalize(gl_LightSource[0].position - v) ; 
vec3 h = normalize(l - normalize(v)); 

float p = 16; 

vec4 cr = gl_FrontMaterial.diffuse; 
vec4 cl = gl_LightSource[0].diffuse; 
vec4 ca = vec4(0.2, 0.2, 0.2, 1.0); 

vec4 color; 
if (dot(h,n) > 0) 

color = cr * (ca + cl * max(0,dot(n,1))) + 


cl * pow(dot(h,n),p); 


else 

color = cr * (ca + cl * max(0,dot(n,1))); 
gl_FragColor = color; 

} 

The first thing you should notice is the similarity between the fragment shader 
code in this example and the vertex shader code presented in Section 18.3.3. The 



Figure 18.8. The results of running the fragment shader from Section 18.3.4. Note that 
the Phong highlight does appear on the left-most model which is represented by a single 
polygon. In fact, because lighting is calculated at the fragment, rather than at each vertex, 
the more coarsely tessellated sphere models also demonstrate appropriate Phong shading. 
(See also Plate IX.) 
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main difference is in the use of the varying variables, v and n. In the fragment 
shader, the view vectors and normal values are interpolated across the surface of 
the model between neighboring vertices. The results are shown in Figure 18.8. 
Immediately, you should notice the Phong highlight on the quadrilateral, which 
only contains four vertices. Because the shading is being calculated at the frag¬ 
ment level using the Phong equation with the interpolated (i.e., varying) data, 
more consistent and accurate Phong shading is produced with far less geometry. 


18.3.5 General Purpose Computing on the GPU 

After studying the vertex and fragment shader examples, you may be wondering 
if you can write programs to perform other types of computations on the GPU. 
Obviously, the answer is yes, as many problems can be coded to run on the GPU 
given the various languages available for programming on the GPU. However, a 
few facts are important to remember. Foremost, floating point math processing 
on graphics hardware is not currently double-precision. Secondly, you will likely 
need to transform your problem into a form that fits within a graphics-related 
framework. In other words, you will need to use the graphics APIs to set up the 
problem, use texture maps as data rather than traditional memory, and write vertex 
and fragment shaders to frame and solve your problem. 

Having stated that, the GPU may still be an attractive platform for computa¬ 
tion, since the ratio of transistors that are dedicated to performing computation 
is much higher on the GPU than it is on the CPU. In many cases, algorithms 
running on GPUs run faster than on a CPU. Furthermore, GPUs perform SIMD 
computation, which is especially true at the fragment-processing level. In fact, 
it can often help to think about the computation occurring on the fragment pro¬ 
cessor as a highly parallel version of a generic f oreach construct, performing 
simultaneous operations on a set of elements. 

There has been a large amount of investigation to perform General Purpose 
computation on GPUs, often referred to as GPGPU. Among other things, re¬ 
searchers are using the GPU as a means to simulate the dynamics of clouds (Harris 
et al., 2003), implement ray tracers (Purcell et al., 2002; N. A. Carr et al., 2002), 
compute radiosity (Coombe et al., 2004), perform 3D segmentation using level 
sets (A. E. Lefohn et al., 2003), or solve the Navier-Stokes equations (Harris, 
2004). 

General purpose computation is often performed on the GPU using multiple 
rendering “passes,” and most computation is done using the fragment processor 
due to its highly data-parallel setup. Each pass, called a kernel, completes a por¬ 
tion of the computation. Kernels work on streams of data with several kernels 
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strung together to form the overall computation. The first kernel completes the 
first part of the computation, the second kernel works on the first kernel’s data, 
and so on, until the calculation is complete. In this style of programming, working 
with data and data structures on the GPU is different than conventional program¬ 
ming and does require a bit of thought. Fortunately, recent efforts are providing 
abstractions and information for creating efficient data structures for GPU pro¬ 
gramming (A. Lefohn et al., 2005). 

Using the GPU for general purpose programming does require that you un¬ 
derstand how to program the graphics hardware. For instance, most applications 
that perform GPGPU will render a simple quadrilateral, or sets of quadrilater¬ 
als, with vertex and fragment shaders operating on that geometry. The geometry 
doesn’t have to be visible, or drawn to the screen, but it is necessary to allow 
the vertex and fragment operations to occur. This focus on graphics does make 
the learning curve for general purpose computing on this hardware an adventure. 
Fortunately, recent efforts are working to make the interface to the GPU more like 
traditional programming. The Brook for GPUs project (Buck et al., n.d.) is a sys¬ 
tem that provides a C-like interface to afford stream computations on the GPU, 
which should allow more people to take advantage of the computational power on 
modern graphics hardware. 


Frequently Asked Questions 

• How do I debug shader programs? 

On most platforms, debugging both vertex shaders and fragment shaders is not 
simple. There is very little runtime support for debugging graphics applications 
in general, and even less available for runtime debugging of shader programs. 
However, this is starting to change. In the latest versions of Mac OS X, Linux, 
and Windows, support for shader programming is incorporated. A good solution 
for debugging shader programs is to use one of the shader development tools 
available from various graphics hardware manufacturers. 


Notes 


There are many good resources available to learn more about the technical de¬ 
tails involved with programming graphics hardware. A good starting point might 
be the OpenGL™ Programming Guide (Shreiner et al., 2004). The OpenGL™ 
Shading Language (Rost, 2004) and The Cg Tutorial (Fernando & Killgard, 2003) 
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provide details on how to program using a shading language. More advanced 
technical information and examples for programming the vertex and fragment 
processors can be found in the GPU Gems series of books 
(Fernando, 2004; Pharr & Fernando, 2005). A source of information for learning 
more about general purpose computation on GPUs (GPGPU) can be found on the 
GPGPU.org web site (http://www.gpgpu.org). 


Exercises 

1. How fast is the GPU as compared to performing the operations on the CPU? 
Write a program in which you can parameterize how much data is processed 
on the GPU, ranging from no computation using a shader program to all 
of the computation being performed using a shader program. How does 
the performance of you application change when the computation is being 
performed solely on the GPU? 

2. Are there sizes of triangle strip lengths that work better than others? Try 
to determine the maximum size of a triangle strip that maximizes perfor¬ 
mance. What does this tell you about the memory, or cache structure, on 
the graphics hardware? 



Plate XXVIII. The spectrum locus for the CIE 1931 standard Plate XXIX. The chromaticity boundaries of the CIE RGB 
observer. (See also Figure 21.6). primaries at 435.8, 546.1, and 700 nm (solid) and a typical 

HDTV (dashed). (See also Figure 21.7.) 



Plate XXX. The CIE u'v' chromaticity diagram. (See also 
Figure 21.8.) 


Plate XXXI. A series of light sources plotted in the CIE u'v' 
chromaticity diagram. A white piece of paper illuminated by 
any of these light sources maintains a white color appear¬ 
ance. (See also Figure 21.11.) 


Solid lines: relative cone responses 
Dashed lines: relative adapted cone responses 



Color representing 
CIE A rendered into 
the sRGB color space 


Wavelength (nm) 


Plate XXXII. An exam¬ 
ple of von Kries-style inde¬ 
pendent photoreceptor gain 
control. The relative cone 
responses (solid line) and 
the relative adapted cone 
responses to CIE illumi- 
nant A (dashed) are shown. 
The separate patch of color 
represents CIE illuminant 
A rendered into the sRGB 
color space. (See also Fig¬ 
ure 21.12.) 




















Plate XXXIII. Crysis exem¬ 
plifies the realistic and de¬ 
tailed graphics expected of 
first-person shooters. Im¬ 
age courtesy Crytek. (See 
also Figure 26.2.) 



Plate XXXIV. An example 
of highly stylized, non- 
photorealistic rendering 
from the game Okami. 
Image courtesy Capcom 
Entertainment, Inc. (See 
also Figure 26.3.) 











Plate XXXV. The LittleBig- 
Planet developers took care 
to choose techniques that 
fit the game's constraints, 
combining them in unusual 
ways to achieve stunning 
results. LittleBigPlanet © 
2007 Sony Computer En¬ 
tertainment Europe. De¬ 
veloped by Media Molecule. 
LittleBigPlanet is a trade¬ 
mark of Sony Computer En¬ 
tertainment Europe. (See 
also Figure 26.4.) 



Plate XXXVI. The normal 
map used in Figure 26.8. In 
this image, the red, green 
and blue channels of the 
texture contain the X, Y, and 
Z coordinates of the surface 
normals. Image courtesy 
Keith Bruns. (See also Fig¬ 
ure 26.9.) 











Plate XXXVII. An early ver¬ 
sion of a diffuse color tex¬ 
ture for the mesh from Fig¬ 
ure 26.8, shown in Photo¬ 
shop. Image courtesy Keith 
Bruns. (See also Figure 
26.10.) 



Plate XXXVIII. A render¬ 
ing (in ZBrush) of the mesh 
with normal map and early 
diffuse color texture (from 
Plate XXXVII) applied. Im¬ 
age courtesy Keith Bruns. 
(See also Figure 26.11.) 
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Plate XXXIX. Final version 
of the color texture from 
Plate XXXVII. Image cour¬ 
tesy Keith Bruns. (See also 
Figure 26.12.) 



Plate XL. Rendering of the 
mesh with normal map and 
final color texture (from Fig¬ 
ure 26.12) applied. Image 
courtesy Keith Bruns. (See 
also Figure 26.13.) 











































Plate XLI. Shader config¬ 
uration in Maya. The in¬ 
terface on the right is used 
to select the shader, assign 
textures to shader inputs, 
and set the values of non¬ 
texture shader inputs (such 
as the “Specular Color” and 
“Specular Power” sliders). 
The rendering on the left is 
updated dynamically while 
these properties are modi¬ 
fied, enabling immediate vi¬ 
sual feedback. Image cour¬ 
tesy Keith Bruns. (See also 
Figure 26.14.) 



Plate XLII. The 

Tableau/Polaris system 

default mappings for four 
visual channels according 
to data type. Image cour¬ 
tesy Chris Stolte (Stolte et 
al., 2008), © 2008 IEEE. 
(See also Figure 27.6.) 
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Plate XLIII. Complex 
glyphs require significant 
display area so that the 
encoded information can 
be read. Image courtesy 
Matt Ward, created with 
the SpiralGlyphics soft¬ 
ware (M. O. Ward, 2002). 
(See also Figure 27.14.) 



































Plate XLIV. Left: The standard rainbow colormap has two defects: it uses hue to denote ordering, and it is 
not perceptually isolinear. (See also Figure 27.8.) Right: The structure of the same dataset is far more clear 
with a colormap where monotonically increasing lightness is used to show ordering and hue is used instead 
for segmenting into categorical regions. (See also Figure 27.9.) Courtesy Bernice Rogowitz. 




Plate XLV. Top: A 3D rep¬ 
resentation of this time se¬ 
ries dataset introduces the 
problems of occlusion and 
perspective distortion. Bot¬ 
tom: The linked 2D views of 
derived aggregate curves 
and the calendar allow di¬ 
rect comparison and show 
more fine-grained patterns. 
image courtesy Jarke van 
Wijk (van Wijk & van Selow, 
1999), © 1999 IEEE. (See 
also Figure 27.10.) 




































































Plate XLVI. Tarantula 
shows an overview of 
source code using one- 
pixel lines color coded 
by execution status of 
a software test suite. 
Image courtesy John 
Stasko (Jones et al., 2002), 
© 2002 ACM, Inc. In¬ 
cluded here by permission. 
(See also Figure 27.11.) 



Plate XLVII. Visual lay¬ 
ering with size, saturation, 
and brightness in the Con¬ 
stellation system (Munzner, 
2000). (See also Figure 
27.12.) 
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Plate XLVIII. The Improvise toolkit was used to create this multiple-view visualization. Image courtesy Chris 
Weaver. (See also Figure 27.16.) 
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Plate XLIX. The Tree- 
Juxtaposer system features 
stretch and squish naviga¬ 
tion and guaranteed vis¬ 
ibility of regions marked 
with colors (Munzner et al., 
2003). (See also Figure 
27.17). 




























































































Plate L. Dimensionality 
reduction with the Glimmer 
multidimensional scaling 
approach shows clusters 
in a document dataset (In¬ 
gram et al., 2009), © 2009 
IEEE. (See also Figure 
27.19.) 



Plate LI. Hierarchical parallel coordinates show high-dimensional data at multiple levels of detail. Image courtesy Matt 
Ward (Fua et al., 1999), © 1999 IEEE. (See also Figure 27.21). 









Plate Lll. Treemap showing a filesystem of nearly one million files. Image courtesy Jean-Daniel Fekete (Fekete & Plaisant, 
2002), © 2002 IEEE. (See also Figure 27.25.) 
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Plate Llll. Two matrices of linked small multiples showing cancer demographic 
data (MacEachren et al., 2003), © 2003 IEEE. (See also Figure 27.26). 















































Building Interactive Graphics 
Applications 


While most of the other chapters in this book discuss the fundamental algorithms 
in the field of computer graphics, this chapter treats the integration of these al¬ 
gorithms into applications. This is an important topic since the knowledge of 
fundamental graphics algorithms does not always easily lead to an understanding 
of the best practices in implementing these algorithms in real applications. 

We start with a simple example: a program that allows the user to simulate the 
shooting of a ball (under the influence of gravity). The user can specify initial ve¬ 
locity, create balls of different sizes, shoot the ball, and examine the parabolic free 
fall of the ball. Some fundamental concepts we will need include mesh structure 
for the representation of the ball (sphere); texture mapping, lighting, and shading 
for the aesthetic appearance of the ball; transformations for the trajectories of the 
ball; and rasterization techniques for the generation of the images of the balls. 

To implement the simple ball shooting program, one also needs knowledge of 

• graphical user interface (GUI) systems for efficient and effective user inter¬ 
action; 

• software architecture and design patterns for crafting an implementation 
framework that is easy to maintain and expand; 

• application program interfaces (APIs) for choosing the appropriate support 
and avoiding a massive amount of unnecessary coding. 
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To gain an appreciation for these three important aspects of building the ap¬ 
plication, we will complete the following steps: 

• analyze interactive applications; 

• understand different programming models and recognize important func¬ 
tional components in these models; 

• define the interaction of the components; 

• design solution frameworks for integrating the components; and 

• demonstrate example implementations based on different sets of existing 
APIs. 

We will use the ball shooting program as our example and begin by refining the 
detailed specifications. For clarity, we avoid graphics-specific complexities in 
3D space and confine our example to 2D space. Obviously, our simple program 
is neither sophisticated nor representative of real applications. However, with 
slightly refined specifications, this example contains all the essential components 
and behavioral characteristics of more complex real-world interactive systems. 

We will continue to build complexity into our simple example, adding new 
concepts until we arrive at a software architecture framework that is suitable for 
building general interactive graphics applications. We will examine the validity of 
our results and discuss how the lessons learned from this simple example can be 
applied to other familiar real-world applications (e.g., PowerPoint, Maya, etc.). 


19.1 The Ball Shooting Program 

Our simple program has the following elements and behaviors. 

• The balls (objects). The user can left-mouse-button-click and drag-out a 
new ball (circle) anywhere on the screen (see Figure 19.1). Dragging-out a 
ball includes: 

- (A). Initial mouse-button-click position defines the center of the cir¬ 
cle; 

- (B). Mouse button down and moving the mouse is the dragging action; 

- (C). Current mouse position while dragging allows us to define the 
radius and the initial velocity. The radius R (in pixel units) is the dis¬ 
tance to the center defined in (A). The vector from the current position 
to the center is the initial velocity V (in units of pixel per second). 
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(V): Initial velocity 


(R): Radiu: 



(C) Current 
mouse position 


(A): Initial mouse 
click position 


(B) Dragging 


Figure 19.1. Dragging out a ball. 

Once created, the ball will begin traveling with the defined initial velocity. 

• HeroBall (Hero/active object). The user can also right-mouse-button- 
click to select a ball to be the current HeroBall. The HeroBall’s velocity 
can be controlled by the slider bars (discussed below) where its velocity 
is displayed. (A newly created ball is by default the current HeroBall.) 
A right-mouse-button-click on unoccupied space indicates that no current 
HeroBall exists. 

• Velocity slider bars (GUI elements). The user can monitor and control 
two slider bars ( x- and //-directions with magnitudes) to change the veloc¬ 
ity of the HeroBall. When there is no HeroBall, the slider bar values are 
undefined. 

• The simulation. 

- Ball traveling/collisions (object intrinsic behaviors). A ball knows 
how to travel based on its current velocity and one ball can potentially 
collide with another. For simplicity, we will assume all balls have 
identical mass and all collisions are perfectly elastic. 

- Gravity (external effects on objects). The velocity of a ball is con¬ 
stantly changing due to the defined gravitational force. 

- Status bar (application state echo). The user can monitor the ap¬ 
plication state by examining the information in the status bar. In our 
application, the number of balls currently on the screen is updated in 
the status bar. 
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Our application starts with an empty screen. The user clicks and drags to 
create new balls with different radii and velocities. Once a ball travels off of the 
screen, it is removed. To avoid unnecessary details, we do not include the drawing 
of the motion trajectories or the velocity vector in our solutions. Notice that a 
slider bar communicates its current state to the user in two ways: the position of 
the slider knob and the numeric echo (see Figure 19.2). 

We have now described the behavior of a simple interactive graphics appli¬ 
cation. In the rest of this chapter, we will learn the concepts that support the 
implementation of this type of application. 

19.2 Programming Models 

For many of us, when we were first introduced to computer programming, we 
learned that the program should always start and end with the main{) function— 
when the main() function returns, all the work must have been completed and the 
program terminates. Since the overall control remains internal to the main() func¬ 
tion during the entire life time of the program, the type of model for this approach 
to solving problems is called an internal control model, or control-driven pro¬ 
gramming. As we will see, an alternative paradigm, event-driven programming 
or an external control model approach, is the more appropriate way to design 
solutions to interactive programs. 
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In this section, we will first formulate a solution to the 2D ball shooting pro¬ 
gram based on the, perhaps more familiar, control-driven programming model. 
We will then analyze the solution, identify shortcomings, and describe the moti¬ 
vation for the external control model or event-driven programming approach. 

The pseudocode which follows is C++/Java-like. We assume typical function¬ 
ality from the operating System (OperatingSystem::) and from a graphical user 
interface API ( GUISystem The purpose of the pseudocode is to assist us in 
analyzing the foundation control structure (i.e., if/while/case) of the solution. For 
this reason, the details of application- and graphics-specific operations are inten¬ 
tionally glossed over. For example, the details of how to UpdateSimulationQ is 
purposely omitted. 


19.2.1 Control-Driven Programming 

The main advantage of control-driven programming is that it is fairly straightfor¬ 
ward to translate a verbal description of a solution to a program control structure. 
In this case, we verbalize our solution as follows: 

while the user does not want to quit (A); 
parse and execute the user’s command (B); 
update the velocities and positions of the balls (C); 
then draw all the balls (D); 

and finally before we poll the user for another command, 

tell the user what is going on by echoing current application state to 
the status bar (E). 


(A) : As long as user is 
not ready to quit 

(B) : Parse the user 
command 

(C) : periodically 
update positions and 
velocities of the balls 

(D) : Draw all balls to 
the computer screen 

(E) : Sets status bar 
with number of balls 


while user command is not quit 

parse and excute user’s command 

if (DpmrtiVfg^ste/ii.vSufficientClockTimeHasElapesd) 

UpdateSimulation() // update the positions and velocities 

// of the all the balls (in AllWorldBalls set) 

DrawBalls (AllWorldBalls) //all the balls in AllWorldBalls set 
EchoToStatusBarQ // Sets status bar: number of balls on screen 


Figure 19.3. Programming structure from a verbalized solution. 
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Figure 19.3 shows a translation from this verbal solution into a simple pro¬ 
gramming structure. We introduce the set of AllWorldBalls to represent all the 
balls that are currently on the computer screen. The only other difference between 
the pseudocode in Figure 19.3 and our verbalized solution is in the added elapsed 
time check in Step (C): SufficientClockTimeHasElapsed. (Recall that the veloci¬ 
ties are defined in pixels per second.) To support proper pixel displacements, we 
must know real elapsed time between updates. 

As we add additional details to parse and execute the user’s commands (B), the 
solution must be expanded. The revised solution in Figure 19.4 shows the details 
of a central parsing switch statement (B) and the support for all three commands 
a user can issue: defining a new HeroBall (Bl); selecting a HeroBall (B2); and 
adjusting current HeroBall velocity with the slider bars (B3). Undefined user 
actions (e.g., mouse movement with no button pressed) are simply ignored (B4). 

Notice that HeroBall creation (Bl) involves three user actions: mouse down 
(Bl), followed by mouse drag (Bl-1), and finally mouse up (Bl-2). The parsing 
of this operation is performed in multiple consecutive passes through the outer 
while-loop (A): the first time through, we create the new HeroBall (Bl); in the 
subsequent passes, we perform the actual dragging operation (Bl-1). We assume 
that mouse drag (Bl-1) will never be invoked without mouse button down (Bl) 
action, and thus the HeroBall is always defined during the dragging operation. 

The LeftMouseButtonUp action (Bl-2) is an implicit action not defined in the 
original specification. In our implementation, we choose this implicit action to 
activate the insertion of the new HeroBall into the AllWorldBalls set. In this 
way the HeroBall is not a member of the AllWorldBalls set until after the user has 
completed the dragging operation. This delay ensures that the HeroBall’s velocity 
and position will not be affected when the Update Simulation () procedure updates 
all the balls in AllWorldBalls set (C). This means a user can take the time to 
drag out a new HeroBall without worrying that the ball will free fall before the 
release of the mouse button. The simple amendment in the drawing operation 
(Dl) ensures a proper drawing of the new HeroBall before it is inserted into the 
AllWorldBalls set. 

When we examine this solution in the context of supporting user interaction, 
we have to concern ourselves with efficiency issues as well as the potential for 
increased complexity. 


Efficiency Concerns. Typically a user interacts with an application in bursts of 
activity—continuous actions followed by periods of idling. This can be explained 
by the fact that, as users, we typically perform some tasks in the application and 
then spend time examining the results. For example, when working with a word 
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main() 

{ 

(A): 


while ( GUISystem: ;UserAction != Quit) { 

(B): 


switch ( GUISystem : :UserAction) { 

(Bl): Define new 

// Begins creating a new Hero Ball 

Hero Ball 

case GUISystem: :LeftMouseButtonDown: 

(B1-1) Support 

HeroBall = CreateHeroBall() // hero not in AllWorldBalls set 

DefiningNewHeroBall = true 

// Drags out the new Hero Ball 

case GU/Sjste/M.vLeftMouseButtonDrag: 

for drag actions 



RefineRadiusAndVelocityOfHeroBall() 

(B1 -2) Implicit 

SetSliderBarsWithHeroBallVelocity() 

Action 

// Finishes creating the new Hero Ball 
case (rU/5ysh?/w::LeftMouseButtonUp: 


InsertHeroBallT oAHWorldBalls() 

DefiningNewHeroBall = false 

(B2): Select 
current Hero Ball 

// Selects a current hero ball 

case G'f//Syste/fi::RightMouseButtonDown: 


HeroBall = SelectHeroBallBasedOnCurrentMouseXY () 
if (HeroBall != null) 


SetSliderBarsWithHeroBallVelocity() 

(B3): Set Hero 

Ball Velocity 

// Sets hero velocity with slider bars 

case GUISystem::S\iderBarChange: 

(B4): Undefined 

if (HeroBall != null) 

SetHeroBallVelocityWithSliderBarValues() 

actions are ignored 


// Ignores all other user actions e.g. Mouse Move with no buttons, etc 

default: 


} // end of switch (userAction) 

(C): 

// Move balls by velocities under gravity and remove off-screen ones 
if ( 0/7£rafc/ig5jwtew::SufficientClockTimeHasElapesd) 


UpdateSimulation() 

(D): 

DrawBalls(AllWorldBalls) 

(Dl): Draw the 
new Hero Ball 

// Draw the new Hero Ball that is currently being defined 
if (DefiningNewHeroBall) 

DrawBalls(HeroBall) 

(E): 

EchoToStatusBarQ // Sets Status Bar with number of balls currently on screen 

} // end of while (UserAction != Quit) 

} // end of mainQ function. Program terminates. 


Figure 19.4. Programming solution based on the control-driven programming model. 


processor, our typical work pattern consists of bursts of typing/editing followed 
by periods of reading (with no input action). In our example application, we 
can expect the user to drag out some circles and then observe the free-falling of 
the circles. The continuous while-loop polling of user commands in the main () 
function means that when the user is not performing any action, our program will 
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still be actively running and wasting machine resources. During activity bursts, 
at the maximum, users are capable of generating hundreds of input actions per 
second (e.g., mouse-pixel movements per second). If we compare this rate to the 
typical CPU instruction capacities that are measured at 10 9 per second, the huge 
discrepancy indicates that, even during activity bursts, the user command-parsing 
switch statement (B) is spending most of the time in the default case not doing 
anything. 

Complexity Concerns. Notice that our entire solution is in the main() function. 
This means that all relevant user actions must be parsed and handled by the user 
command-parsing switch statement (B). In a modern multi-program shared win¬ 
dow environment, many actions performed by users are actually non¬ 
application specific. For example, if a user performs a left mouse button click or 
drag in the drawing area of the program window, our application should react by 
dragging out a new HeroBall. However, if the user performs the same actions in 
the title area of the program window, our application should forward these actions 
to the GUI/Operating/Window system and commence the coordination of moving 
the entire program window. As experienced users in window environments, we 
understand that there are numerous such non-application specific operations, and 
we expect all applications to honor these actions (e.g., iconize, re-size, raise or 
lower a window, etc.). Following the solution given in Figure 19.4, for every user 
action that we want to honor, we must include a matching supporting case in the 
parsing switch statement (B). This requirement quickly increases the complexity 
of our solution and becomes a burden to implementing any interactive applica¬ 
tions. 

An efficient GUI system should remain idle by default (not taking up ma¬ 
chine resources) and only become active in the presence of interesting activities 
(e.g., user input actions). Furthermore, to integrate interactive applications in 
sophisticated multi-programming window environments, it is important that the 
supporting GUI system automatically takes care of mundane and standard user 
actions. 


19.2.2 Event-Driven Programming 

Event-driven programming remedies the efficiency and complexity concerns with 
a default MainEventLoopQ function defined in the GUI system. For event-driven 
programs, the MainEventLoopi) replaces the main{ ) function, because all pro¬ 
grams start and end in this function. Just as in the case of the main{) func¬ 
tion for control-driven programming, when the MainEventLoopi) function re- 


19.2. Programming Models 


475 


UlSystem: :MainEventLoop() { 

(A): For application 

SystemlnitializationO 

//For initialization of application state and 

initialization 

// registration oj event service routines 

(B) : Continuous 
outer loop 

(C) : Stop and wait 
for next event 

loop forever { 

Wait For ( GUISystem: :NextEvent) 

// Program will stop and wait for the next event 

switch ( GUISystem : :NextEvent) { 

case Gf//5vste/H"LeftMouseButtonDown: 

(D): Central parsing 

if (user application registered for this event) 

Execute user defined service routine. 

else 

switch statement 

Execute default UlSystem routine. 

Every possible 

case GUISystem ::Iconize: 

if (user application registered for this event) 

Execute user defined service routine. 

else 

event 

GUISystem : :DefaultIconizeBehavior() 

} //endofsmtch(GUISystem::NextEvent) 

} // end of loop forever 

} //endof GUISystem::MainEventLoop() function. Program terminates. 


Figure 19.5. The default MainEventLoop function. 


turns, all work should have been completed, and the program terminates. The 
MainEventLoopO function defines the central control structure for all event-driven 
programming solutions and typically cannot be changed by a user application. In 
this way, the overall control of an application is actually external to the user’s 
program code. For this reason, event-driven programming is also referred to as 
the external control model. 

Figure 19.5 depicts a typical MainEventLoopO implementation. In this case, 
our program is the user application that is based on the MainEventLoopO func¬ 
tion. Structurally, the MainEventLoopO is very similar to the mainO function 
of Figure 19.4: with a continuous loop (B) containing a central parsing switch 
statement (D). The important differences between the two functions include: 

• (A) SystemInitialization(). Recall that event-driven programs start and end 
in the MainEventLoopO function. SystemlnitializationO is a mechanism 
defined to invoke the user program from within the MainEventLoopO ■ It 
is expected that user programs implement SystemlnitializationO to initial¬ 
ize the application state and to register event service routines (refer to the 
discussion in (D)). 
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(B) Continuous outer loop. Since this is a general control structure to 
be shared by all event-driven programs, there is no way to determine the 
termination condition. User program are expected to override appropriate 
event service routines and terminate the program from within the service 
routine. 

(C) Stop and wait. Instead of actively polling the user for actions (wasting 
machine resources), the MainEventLoop() typically stops the entire appli¬ 
cation process and waits for asynchronous operating system calls to re¬ 
activate the application process in the presence of relevant user actions. 

(D) Events and central parsing switch statement. Included in this state¬ 
ment are all possible actions/events (cases) that a user can perform. 
Associated with each event (case) is a default behavior and a toggle 
that allows user applications to override the default behavior. During 
Systemlnitialization (), the user application can register an alternate service 
routine for an event by toggling the override. 

To develop an event-driven solution, our program must first register event service 
routines with the GUI system. After that, our entire program solution is based on 
waiting and servicing user events. While control-driven programming solutions 
are based on an algorithmic organization of control structures in the mainQ func¬ 
tion, an event-driven programming solution is based on the specification of events 
that cause changes to a defined application state. This is a different paradigm for 
designing programming solutions. The key difference here is that, as program¬ 
mers, we have no explicit control over the algorithmic organization of the events: 
over which, when, or how often an event should occur. 

The program in Figure 19.6 implements the left mouse button operations for 
our ball shooting program. We see that during system initialization (A), the pro¬ 
gram defines an appropriate application state (Al) and registers left mouse button 
(LMB) down/drag/up events (A2). The corresponding event service routines (D1, 
D2, and D3) are also defined. At the end of each event service routine, we redraw 
all the balls to ensure that the user can see an up-to-date display at all times. No¬ 
tice the absence of any control structure organizing the initialization and service 
routines. Recall that this is an event-driven program: the overall control structure 
is defined in the MainEventLoop which is external to our solution. 

Figure 19.7 shows how our program from Figure 19.6 is linked with the pre¬ 
defined MainEventLoop^) from the GUI system. The MainEventLoop () calls the 
SystemInitialization() function defined in our solution (A). As described, after 
the initialization, our entire program is essentially the three event service rou¬ 
tines (Dl, D2, and D3). However, we have no control over the invocation of 
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(A) System Initialization: 

(Al): Define Application State: 

AllWorldBalls: A set of defined Balls, initialze to empty 
HeroBall: current active ball, initialize to null 

(A2): Register Event Service Routines 

Register for: Left Mouse Button Down Event 
Register for: Left Mouse Button Drag Event 
Register for: Left Mouse Button Up Event 

// We care about these events, inform us if these events happen 

(D) Events Services: 

(Dl): Left Mouse Button Down// service routine for this event 
HeroBall = Create a new ball at current mouse position 
DrawAllBalls(AllWorldBalls, HeroBall) //Draw all balls (including HeroBall) 

(D2): Left Mouse Button Drag // service routine for this event 
RefineRadiusAndVelocityOfHeroBall() 

DrawAllBalls(AllWorldBalls, HeroBall) //Draw all balls (inlucding HeroBall) 

(D3): Left Mouse Button Up // service routine for this event 

InsertHeroBallToAllWorldBallsO 
DrawAllBalls(AllWorldBalls, null) // Draw all balls 


Figure 19.6. A simple event-driven program specification. 


these routines. Instead, a user performs actions that trigger events which drive 
these routines. These routines in turn change the application state. In this way, 
an event-driven programming solution is based on specification of events (LMB 
events) that cause changes to a defined application state (AllWorldBalls and Her¬ 
oBall). Since the user command parsing switch statement (D in Figure 19.7) in the 
MainEventLoopi) contains a case and the corresponding default behavior for ev¬ 
ery possible user actions, without any added complexity, our solution honors the 
non-application specific actions in the environment (e.g., iconize, moving, etc). 

In the context of event-driven programming, an event can be perceived as 
an asynchronous notification that something interesting has happened. The mes¬ 
senger for the notification is the underlying GUI system. The mechanism for 
receiving an event is via overriding the corresponding event service routine. 

For these reasons, when discussing event-driven programming, there is always 
a supporting GUI system. This GUI system is generally referred to as the graphi¬ 
cal user interface (GUI) application programming interface (API). 
Examples of GUI APIs include: Java Swing Library, OpenGL Utility 
ToolKit (GLUT), The Fast Light ToolKit (FLTK), Microsoft Foundation Classes 
(MFC), etc. 

From the above discussion, we see that the registration for services of appro¬ 
priate events is the core of designing and developing solutions for event-driven 
programs. Before we begin developing a complete solution for our ball shooting 
program, let us spend some time understanding events. 
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Pre-defined GUI 
system function 


-< 


G TJISystem: :MainEventLoop () { 

SystemInitialization() 

// This will call the user defined function 
loop forever { 

WaitFor ( GUISystem: :NexiEvent) { 

// Process wil stop and wait for the next event 
switch (GUISystem ::NextEvent) { 


(A) 


(B) 

(C) 

(D) 


(Dl) 


V 


case UUZSy.vte/H.vLeftMouseButtonDown: 
if (user application registered for this event) 

Invoke LeftMouseButtonDownServiceRoutine( currentMousePosition) 

else 

Execute default GUISystem routine. 

case Cf//5y.vto«::LeftMouseButtonDrag: 

if (user application registered for this event) 

Invoke LeftMouseButtonDragServiceRoutine( currentMousePosition) 

else 

Execute default GUISystem routine. 

case (7U/Syste/M::LeftMouseButtonUp: 

if (user application registered for this event) 

Invoke LeftMouseButtonUpServiceRoutine( currentMousePosition) <— 

else 

Execute default GUISystem routine. 

: // there are many other events that does not concern us 

} //endofswitch(GUISystem::NextEvent) Establish these links 

} // end of loop forever 

} // end of GUISystem::MainEventLoop() function. Program terminates. 


User solution 
program 


< 


^SystemInitialization() { //(A) 

// (Al): Define Application State: 

AllWorldBalls: A set of defined Balls, initialze to empty 
HeroBall = null 

// (A2): Register Event Service Routines _ 

GUISystem: :RegisterServiceRouUn£(GUystem:: LMBDown, LMBDoneRoutine) 1 
GUISystem ::RegisterServiceRoutine(G Uystem:: LMBDrag, LMBDragRoutine) L 
Gl/ystem^RegisterServiceRoutine^l/yste/f*:: LMBUp, LMBUpRoutine) 

// “LMB ” stands for: Left Mouse Button J 


} 

// Event Service Routines (D) 

LMBDownRoutine( mousePosition) 

HeroBall = new ball at mousePosition 

DrawAllBalls(AllWorldBalls, HeroBall) //Draw all balls (including HeroBall) 


//Dl: Left Mouse Button Down service routine 


LMBDragRoutine( mousePosition ) // D2: Left Mouse Button Drag service routine 

RefineRadiusAndVelocityOfHeroBall( mousePosition) 

DrawAllBalls(AllWorldBalls, HeroBall) //Draw all balls (inlucding HeroBall) 


LMBUpRoutine( mousePosition) 

V InsertHeroBallToAllWorldBalls() 
DrawAllBalls(AllWorldBalls, null) 


// D3: Left Mouse Button Up service routine 
// Draw all balls 


Figure 19.7. Linking MainEventLoop with our solution. 


Graphical User Interface (GUI) Events 

In general, an application may receive events generated by the user, the applica¬ 
tion itself, or by the GUI system. In this section, we describe each of these event 
sources and discuss the application’s role in servicing these events. 

SI: The User. These are events triggered by the actions a user performs on the 
input devices. Notice that input devices include actual hardware devices (e.g., 
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mouse, keyboard, etc.) and/or software-simulated GUI elements (e.g., slider bars, 
combo boxes, etc.). Typically, a user performs actions for two very different 
reasons: 

• Sla: Application specific. These are input actions that are part of the 
application. Clicking and dragging in the application screen area to create a 
HeroBall is an example of an action performed on a hardware input device. 
Changing the slider bars to control the HeroBall’s velocity is an example of 
an action performed on a software-simulated GUI element. Both of these 
actions and the resulting events are application specific; the application (our 
program) is solely responsible for servicing these events. 

• Sib: General. These are input actions defined by the operating environ¬ 
ment. For example, a user clicks and drags in the window title-bar area 
expecting to move the entire application window. The servicing of these 
types of events requires collaboration between our application and the GUI 
system. We will discuss the servicing of these types of events in more detail 
when explaining events that originate from the GUI system in S3c. 

Notice that the meaning of a user’s action is context sensitive. It depends on where 
the action is performed: click and drag in the application screen area vs. slider 
bar vs. application window title-bar area. In any case, the underlying GUI system 
is responsible for parsing the context and determining which application element 
receives a particular event. 

S2: The Application. These are events defined by the application, typically de¬ 
pending on some run-time conditions. During run time, if and when the condition 
is favorable, the supporting GUI system triggers the event and conveys the favor¬ 
able conditions to the application. A straightforward example is a periodic alarm. 
Modern GUI systems typically allow an application to define (sometimes multi¬ 
ple) timer events . Once defined, the GUI system will trigger an event to wake 
up the application when the timer expires. As we will see, this timer event is es¬ 
sential for supporting real-time simulations. Since the application (our program) 
requested the generation of these types of events, our program is solely respon¬ 
sible for serving them. The important distinction between application-defined 
and user-generated events is that application-defined events can be spontaneous : 
when properly defined, even when the user is not doing anything, these types of 
events may trigger. 

S3: The GUI System. These are events that originate from within the GUI 
system in order to convey state information to the application. There are typically 
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three reasons for these events: 

• S3a: Internal GUI states change. These are events signaling an internal 
state change of the GUI system. The GUI system typically generates an 
event before the creation of the application’s main window. This provides 
an opportunity for the application to perform the corresponding initializa¬ 
tion. In some GUI systems (e.g., MFC) the SystemInitialization () func¬ 
tionality is accomplished with these types of events: user applications are 
expected to override the appropriate windows’ creation event and initialize 
the application state. Modern, general purpose commercial GUI systems 
typically define a large number of events signaling detailed state changes in 
anticipation of supporting different types of applications and requirements. 
For example, for the creation of the application's main window, the GUI 
system may define events for the following states: 

- before resource allocation; 

- after resource allocation but before initialization; 

- after initialization but before initial drawing, etc. 

A GUI system usually defines meaningful default behaviors for such events. 
To program an effective application based on a GUI system, one must un¬ 
derstand the different groups of events and only service the appropriate 
selections. 

• S3b: External environment requests attention. These are events indi¬ 
cating that there are changes in the operating environment that potentially 
require application attention. For example, a user has moved another ap¬ 
plication window to cover a portion of our application window, or a user 
has minimized our application window. The GUI system and the window 
environment typically have appropriate service routines for these types of 
events. An application would only choose to service these events when 
special actions must be performed. For example, in a real-time simulation 
program, the application may choose to suspend the simulation if the appli¬ 
cation window is minimized. In this situation, an application must service 
the minimized and the maximized events. 

• S3c: External environment requests application collaboration. These 
are typically events requesting the application’s collaboration to complete 
the service of general user actions (please refer to Sib). For example, if 
a user click-drags the application window’s title bar, the GUI system re¬ 
acts by letting the user “drag” the entire application window. This “drag” 
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operation is implemented by continuously erasing and redrawing the entire 
application window at the current mouse pointer position on the computer 
display. The GUI system has full knowledge of the appearance of the ap¬ 
plication window (e.g., the window frames, the menus, etc.), but it has no 
knowledge of the application window content (e.g., how many free falling 
balls traveling at what velocity, etc.). In this case, the GUI system redraws 
the application window frame and generates a Redraw/Paint event for the 
application, requesting assistance in completing the service of the user’s 
“drag” operation. As an application in a shared window environment, our 
application is expected to honor and service these types of events. The most 
common events in this category include: Redraw/Paint and Resize. Re¬ 
draw/Paint is the single most important event an application must service, 
because it supports the most common operations a user may perform in a 
shared window environment. Resize is also an important event to which 
the application must respond because the application is in charge of GUI 
element placement policy (e.g., if window size is increased, how should the 
GUI elements be placed in the larger window). 


19.2.3 The Event-Driven Ball Shooting Program 

In Section 19.2.1, we started a control-driven programming solution to the ball 
shooting program based on verbalizing the conditions (controls) under which the 
appropriate actions should be taken: 

while favorable condition, parse the input... 

As we have seen, with appropriate modifications, we were able to detail the con¬ 
trol structures for our solution. 

From the discussion in Section 19.2.2, we see that to design an event-driven 
programming solution we must 

1. define the application state; 

2. describe how user actions change the application state; 

3. map the user actions to events that the GUI system supports; and 

4. override corresponding event service routines to implement user actions. 

The specification in Section 19.1 detailed the behaviors of our ball shooting pro¬ 
gram. The description is based on actions performed on familiar input devices 


482 


19. Building Interactive Graphics Applications 


(e.g., slider bars and mouse) that change the appearance on the display screen. 
Thus, the specification from Section 19.1 describes items (2) and (3) from the 
above list without explicitly defining what the application state is. Our job in de¬ 
signing a solution is to derive the implicitly defined application state and design 
the appropriate service routines. 

Figure 19.8 presents our event-driven programming solution. As expected, 
the application state (Al) is defined in SystemlnitializationQ. The AllWorldBalls 
set and HeroBall can be derived from the specification in Section 19.1. The Defin- 
ingNewHeroBall flag is a transient (temporary) application state designed to sup¬ 
port user actions across multiple events (click-and-drag). Using transient appli¬ 
cation states is a common approach to support consecutive inter-related events. 

Figure 19.8 shows the registration of three types of service routines (A2): 

• user-generated application specific events (Sla); 

• an application defined event (S2); 

• a GUI system-generated event requesting collaboration (S3c). 

The timer event definition (A2S2) sets up a periodic alarm for the application to 
update the simulation of the free falling balls. The service routines of the user¬ 
generated application specific events (D1-D5) are remarkably similar to the cor¬ 
responding case statements in the control-driven solution presented in Figure 19.4 
(B1-B3). It should not be surprising that this is so, because we are implement¬ 
ing the exact same user actions based on the same specification. Line 3 of the 
LMBDownRoutine () (D1L3) demonstrates that, when necessary, our application 
can request the GUI system to initiate events. In this case, we signal the GUI 
system that an application redraw is necessary. Notice that event service routines 
are simply functions in our program. This means, at D1L3 we could also call 
RedrawRoutine () (D7) directly. The difference is that a call to RedrawRoutinei) 
will force a redraw immediately while requesting the generation of a redraw event 
allows the GUI system to optimize the number of redraws. For example, if the 
user performs a LMB click and starts dragging immediately, with our D1 and D2 
implementation, the GUI system can gather the many GenerateRedrawEvent re¬ 
quests in a short period of time and only generate one re-draw event. In this way, 
we can avoid performing more redraws than necessary. 

In order to achieve a smooth animation, we should perform about 20-40 up¬ 
dates per second. It follows that the SimulationUpdatelnterval should be no more 
than 50 milliseconds so that the ServiceTimerQ routine can be invoked more than 
20 times per second. (Notice that a redraw event is requested at the end of the 
ServiceTimer() routine.) This means, at the very least, our application is guaran¬ 
teed to receive more than 20 redraw events in one second. For this reason, the 
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SystemInitialization() { // (A) 

// (Al): Define Application State 

AllWorldBalls: A set of defined Balls, initialze to empty 
HeroBall = null 
DefiningNewHeroBall = false 


’} 


Al: Application 
State 


A2S2: Defines 
a Timer Event 


// (A2): Register Event Service Routines 

// SI a: Application Specific User Events 

GUISystem::RegistQrSQrviceRoutinQ(GUISystem:: LMBDown, LMBDownRoutine) 
GUISystem::RegisterSQrviceRoutmQ(GUISystem:: LMBDrag, LMBDragRoutine) 
GUISystem::RegisterServiceRoutme(GUISystem:: LMBUp, LMBUpRoutine) 
G , T7Sjstew.\-RegisterServiceRoutine((/LT»S'j'ste/«:: RMBDown, RMBDownRoutine) 
GUISystem::RegisterSeTvicGRoutine(GUISystem:: SliderBar, SliderBarRoutine) 

(//S2: Application Define Event 

J GUISystem::DefmeTimerReriod(SimulationUpdateInterval) 

GUISystem ::RegisterServiceRo\itine(GUISystem:: TimerEvent, ServiceTimer) 

// Triggers TimerEvent every: SimulationUpdatelntervalperiod 
// S3c: Honor collaboration request from the GUI system 

GUISystem::RegisterServiceRoutine(GUISystem:: RedrawEvent, RedrawRoutine) 


// Event Service Routines (D) 

LMBDownRoutine( mousePosition ) // Dl: Left Mouse Button Down service routine 

HeroBall = CreateHeroBall (mousePosition) 

DefiningNewHeroBall = true 
G6T5j.vte/w::GenerateRedrawEvent 


D1L3: Force a 
Redraw Event 


LMBDragRoutine( mousePosition ) // D2: Left Mouse Button Drag service routine 

RefmeRadiusAndVelocityOfHeroBall( mousePosition ) 

SetSliderBars W ithHeroBall V elocity() 

G , f//AysY<?/M::GenerateRedrawEvent // Generates a redraw event 


LMBUpRoutine( mousePosition ) // D3: Left Mouse Button Up service routine 

InsertHeroBallToAllWorldBalls() 

DefiningNewHeroBall = false 


RMBDownRoutine ( mousePosition ) //D4: Right Mouse Button Down service routine 
HeroBall = SelectHeroBallBasedOn (mousePosition ) 
if (HeroBall != null) SetSliderBars WithHeroBallVelocity() 


SliderBarRoutine ( sliderBarValues ) // D5: Slider Bar changes service routine 
if (HeroBall != null) 

SetSliderBarsWithHeroBallVelocity( sliderBarValues ) 


ServiceTimer () // D6: Timer expired service routine 

UpdateSimulation( ) // Move balls by velocities and remove off-screen ones 

EchoToStatusBar() // Sets status bar with number of balls on screen 

GUISystem:: GenerateRedrawEvent // Generates a redraw event 
if (HeroBall != null) //Reflect propoer HeroBall velocity 

SetSliderBarsWithHeroBallVelocity( sliderBarValues) 


RedrawRoutine ( ) //D 7'.Redraw event service routine 

DrawBalls(AllWorldBalls) 
if (DefiningNewHeroBall) 

DrawBalls(HeroBall) //Draw the new Hero Ball that is being defined 


Figure 19.8. Programming solution based on the event-driven programming model. 
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GenerateRedrawEvent requests in D1 and D2 are really not necessary. The ser¬ 
vicing of our timer events will guarantee us an up-to-date display screen at all 
times. 


19.2.4 Implementation Notes 

The application state of an event-driven program must persist over the entire life 
time of the program. In terms of implementation, this means that the application 
state must be defined based on variables that are dynamically allocated during run 
time and that reside on the heap memory. These are in contrast to local variables 
that reside on the stack memory and which do not persist over different function 
invocations. 

The mapping of user actions to events in the GUI system often results in im¬ 
plicit and/or undefined events. In our ball shooting program, the actions to define 
a HeroBall involve left mouse button down and drag. When mapping these ac¬ 
tions to events in our implementation (in Figure 19.4 and Figure 19.8), we realize 
that we should also pay attention to the implicit mouse button up event. Another 
example is the HeroBall selection action: right mouse button down. In this case, 
right mouse button drag and up events are not serviced by our application, and 
thus, they are undefined (to our application). 

When one user action (e.g., “drag out the HeroBall”) is mapped to a group 
of consecutive events (e.g., mouse button down, then drag, then up) a finite state 
diagram can usually be derived to help design the solution. Figure 19.9 depicts 
the finite state diagram for defining the HeroBall in our ball shooting program. 



Figure 19.9. State diagram for defining the HeroBall. 
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The left mouse button down event puts the program into State 1 where, in our 
solution from Figure 19.8, LMBDownRoutineQ implements this state and defines 
the center of the HeroBall, etc. In this case the transition between states is trig¬ 
gered by the mouse events, and we see that it is physically impossible to move 
from State 2 back to State 1. However, we do need to handle the case where the 
user action causes a transition from State 1 to State 3 directly (mouse button down 
and release without any dragging actions). This state diagram helps us analyze 
possible combinations of state transitions and perform appropriate initializations. 

Event-driven applications interface with the user through physical (e.g., mouse 
clicks) or simulated GUI elements (e.g., quit button, slider bars). An input GUI 
element (e.g., the quit button) is an artifact (e.g., an icon) for the user to direct 
changes to the application state, while an output GUI element (e.g., the status 
bar) is an avenue for the application to present application state information to 
the user as feedback. For both types of elements, information only flows in one 
direction—either from the user to the application (input) or from the application 
to the user (output). When working with GUI elements that serve both input and 
output purposes, special care is required. For example, after the user selects or 
defines a HeroBall, the slider bars reflects the velocity of the free falling HeroBall 
(output), while at any time, the user can manipulate the slider bar to alter the Her¬ 
oBall velocity (input). In this case, the GUI element’s displayed state and the ap¬ 
plication’s internal state are connected. The application must ensure that these two 
states are consistent. Notice that in the solution shown in Figure 19.4, this state 
consistency is not maintained. When a user clicks the RMB (B2 in Figure 19.4) 
to select a HeroBall, the slider bar values are updated properly; however, as the 
HeroBall free falls under gravity, the slider bar values are not updated. The so¬ 
lution presented in Figure 19.8 fixes this problem by using the ServiceTimer() 
function. 

Event service routines are functions defined in our program that cause a call¬ 
back from the MainEventFoop in the presence of relevant events. For this reason, 
these service routines are also referred to as callback functions. The application 
program registers callback functions with the GUI system by passing the address 
of the function to the GUI system. This is the registration mechanism implied in 
Figure 19.7 and Figure 19.8. Simple GUI systems (e.g., GFUT or FFTK) usually 
support this form of registration mechanism. The advantage of this mechanism is 
that it is easy to understand, straightforward to program, and often contributes to 
a small memory footprint in the resulting program. The main disadvantage of this 
mechanism is the lack of organizational structure for the callback functions. 

In commercial GUI systems, there are a large numbers of events with which 
user applications must deal, and a structured organization of the service routines 
can assist the programmability of the GUI system. Modern commercial GUI sys- 
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terns are often implemented based on object-oriented languages (e.g., C++ for 
MFC, Java for Java Swing). For these systems, many event service registrations 
are implemented as sub-classes of an appropriate GUI system class, and they over¬ 
ride corresponding virtual functions. In this way, the event service routines are 
organized according to the functionality of GUI elements. The details of different 
registration mechanisms will be explained in Section 19.4.1 when we describe the 
implementation details. 

Event service routines (or callback functions) are simply functions in our pro¬ 
gram. However, these functions also serve the important role as the server of 
external asynchronous events. The following are guidelines one should take into 
account when implementing event service routines: 

1. An event service routine should only service the triggering event and imme¬ 
diately return the control back to the MainEventLoopQ . This may seem to 
be a “no-brainer.” However, because of our familiarity with control-driven 
programming, it is often tempting to anticipate/poll subsequent events with 
a control structure in the service routine. For example, when servicing the 
left mouse button down event, we know that the mouse drag event will hap¬ 
pen next. After allocating and defining the circle center, we have properly 
initialized data to work with the HeroBall object. It may seem easier to 
simply include a while loop to poll and service mouse drag events. How¬ 
ever, with all the other external events that may happen (e.g., timer event, 
external redraw events, etc.), this monopolizing of control in one service 
routine is not only a bad design decision, but also it may cause the program 
to malfunction. 

2. An event service routine should be stateless, and individual invocations 
should be independent. In terms of implementation, this essentially means 
event service routines should not define local static variables that record 
data from previous invocations. Because we have no control over when, 
or how often, events are triggered, when these variables are used as data, 
or conditions for changing application states, it can easily lead to disas¬ 
trously and unnecessarily complex solutions. We can always define extra 
state variables in the application state to record temporary state information 
that must persist over multiple event services. The DefiningNewHeroBall 
flag in Figure 19.8 is one such example. 

3. An event service routine should check for invocation conditions regard¬ 
less of common sense logical sequence. For example, although logically, 
a mouse drag event can never happen unless a mouse down event has al¬ 
ready occurred, in reality, a user may depress a mouse button from outside 
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of our application window and then drag the mouse into our application 
window. In this case, we will receive a mouse drag event without the cor¬ 
responding mouse down event. For this reason, the mouse drag service 
routine should check the invocation condition that the proper initialization 
has indeed happened. Notice in Figure 19.8, we do not include proper in¬ 
vocation condition checking. For example, in the LM B Drag Ron tin e(), we 
do not verify that LMBDownRotine( ) has been invoked (by checking the 
DefiningNewHeroBall flag). In a real system, this may causes the program 
to malfunction and/or crash. 


19.2.5 Summary 

In this section we have discussed programming models or strategies for organiz¬ 
ing statements of our program. We have seen that for interactive applications, 
where an application continuously waits and reacts to a user’s input actions, or¬ 
ganizing the program statements based on designing control structures results in 
complex and inefficient programs. Existing GUI systems analyze all possible 
user actions, design control structures to interact with the user, implement default 
behaviors for all user actions, and provide this functionality in GUI APIs. To 
develop interactive applications, we take advantage of the existing control struc¬ 
ture in the GUI API (i.e., the MainEventLoopQ) and modify the default behaviors 
(via event service routines) of user actions. In order to properly collaborate with 
existing GUI APIs, the strategy for organizing the program statements should be 
based on specifying user actions that cause changes to the application state. 

Now that we understand how to organize the statements of our program, let’s 
examine strategies for organizing functional modules of our solution. 


19.3 The Modelview-Controller Architecture 

The event-driven ball shooting program presented in Section 19.2.3 and Fig¬ 
ure 19.8 addresses programmability and efficiency issues when interacting with a 
user. In the development of that model, we glossed over many supporting func¬ 
tions (e.g., UpdateSimulation()) needed in our solution. In this section, we de¬ 
velop strategies for organizing these functions. Notice that we are not interested 
in the implementation details of these functions. Instead, we are interested in 
grouping related functions into components. We then pay attention to how the 
different components collaborate to support the functionality of our application. 
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In this way, we derive a framework that is suitable for implementing general in¬ 
teractive graphics applications. With a proper framework guiding our design and 
implementation, we will be better equipped to develop programs that are easier to 
understand, maintain, modify, and expand. 


19.3.1 The Modelview-Controller Framework 

Based on our experience developing solutions in Section 19.2, we understand that 
interactive graphics applications can be described as applications that allow users 
to interactively update their internal states. These applications provide real-time 
visualization of their internal states (e.g., the free-falling balls) with computer 
graphics (e.g., drawing circles). The modelview-controller (MVC) framework 
provides a convenient structure for discussing this type of application. In the 
MVC framework, the model is the application state, the view is responsible for 
setting up support for the model to present itself to the user, and the controller 
is responsible for providing the support for the user to interact with the model. 
Within this framework, our solution from Figure 19.8 is simply the implementa¬ 
tion of a controller. In this section, we will develop the understanding of the other 
two components in the MVC framework and how these components collaborate 
to support interactive graphics applications. 

Figure 19.10 shows the details of a MVC framework to describe the behavior 
of a typical interactive graphics application. We continue to use the ball shooting 
program as our example to illustrate the details of the components. The top-right 
rectangular box is the model, the bottom-right rectangular box is the view, and 
the rectangular box on the left is the controller component. These three boxes 
represent program code we, as application developers, must develop. The two 
dotted rounded boxes represent external graphics and GUI APIs. These are the 
external libraries that we will use as a base for building our system. Examples of 
popular Graphics APIs include OpenGL, Microsoft Direct-3D (D3D), Java 3D, 
among others. As mentioned in Section 19.2.2, examples of popular GUI APIs 
include GLUT, FLTK, MFC, and Java Swing Library. 

The model component defines the persistent application state (e.g., AllWorld- 
Balls, HeroBalls, etc.) and implements interface functions for this application 
state (e.g., UpdateSimulation()). Since we are working with a “graphics” ap¬ 
plication, we expect graphical primitives to be part of the representation for the 
application state (e.g., CirclePrimitives). This fact is represented in Ligure 19.10 
by the application state (the ellipse) partially covering the Graphics API box. In 
the rest of this section, we will use the terms model and persistent application 
state interchangeably. 
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Figure 19.10. Components of an interactive graphics application. 

The view component is in charge of drawing to the drawing area on the ap¬ 
plication window (e.g., drawing the free falling balls). More specifically, the view 
component is responsible for initializing the graphics API transformation such 
that drawing of the model’s graphical primitives will appear in the appropriate 
drawing area. The arrow from the view to the model component signifies that the 
actual application state redraw must be performed by the model component. Only 
the model component knows the details of the entire application state (e.g., size 
and location of the free falling circles) so only the model component can redraw 
the entire application. The view component is also responsible for transform¬ 
ing user mouse click positions to a coordinate system that the model understands 
(e.g., mouse button clicks for dragging out the hero ball). 

The top left external events arrow in Figure 19.10 shows that all external 
events are handled by the MainEventLoopi). The relevant events will be for¬ 
warded to the event service routines in the controller component. Since the con¬ 
troller component is responsible for interacting with the user, the design is typi¬ 
cally based on event-driven programming techniques. The solution presented in 
Section 19.2.3 and Figure 19.8 is an example of a controller component imple¬ 
mentation. The arrow from the controller to the model indicates that most external 
events eventually change the model component (e.g., creating a new HeroBall or 
changing the current HeroBall velocity). The arrow from the controller to the 
view component indicates that the user input point transformation is handled by 
the view component. Controllers typically return mouse click positions in the de¬ 
vice coordinate with the origin at the top-left corner. In the application model, it is 
more convenient for us to work with a coordinate system with a lower-left origin. 

























490 


19. Building Interactive Graphics Applications 


The view component with its transformation functionality has the knowledge to 
perform the necessary transformation. 

Since the model must understand the transformation set up by the view, it is 
important that the model and the view components are implemented based on the 
same Graphics API. However, this sharing of an underlying supporting API does 
not mean that the model and view are an integrated component. On the contrary, 
as will be discussed in the following sections, it is advantageous to clearly dis¬ 
tinguish between these two components and to establish well-defined interfaces 
between them. 


19.3.2 Applying MVC to the Ball Shooting Program 

With the described MVC framework and the understanding of how responsibili¬ 
ties are shared among the components, we can now extend the solution presented 
in Figure 19.8 and complete the design of the ball shooting program. 


The Model 

The model is the application state and thus this is the core of our program. When 
describing approaches to designing an event-driven program in Section 19.2.3, 
the first two points mentioned were: 

1. define the application state, and 

2. describe how a user changes this application state. 

These two points are the guidelines for designing the model component. In 
an object-oriented environment, the model component can be implemented as 
classes, and state of the application can be implemented as instance variables, 
with “how a user changes this application state ” implemented as methods of the 
classes. 

Figure 19.11 shows that the instance variables representing the state are typ¬ 
ically private to the model component. As expected, we have a “very graphical” 
application state. To properly support this state, we define the CirclePrimitive 
class based on the underlying graphics API. The CirclePrimitive class supports the 
definition of center, radius, drawing, and moving of the circle, etc. Figure 19.11 
also shows the four categories of methods that a typical model component must 
support. 
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class ApplicationModel { Class built on 

private: Graphics API 

//Application's private state 

vector<CirclePrimitive> AllWorldBalls // World balls, initially empty 

CirclePrimitive HeroBall // The Hero Ball 

bool DefiningNewHero // IfLMB drag is true 


public: 

bool 

bool 

int 

float 

float 

void 

void 

void 

void 

void 

void 

void 


IsDefinningHeroBall() 

// IfLMB drag is true, same as if we are in 
// the middle of defiing the a new hero ball 
HeroBallExists() 

// Current hero ball is not null 
NumBallsOnScreen() 

// Number of balls currently on screen 
Hero V elocityX() 

// Hero’s velocity, x-component 


y 


Hero Velocity Y() 

// Hero’s velocity, y-component 

CreateHeroBall (mousePosition) ^ 

// Creates new hero ball, with center at mousePosition 
// radius and velocity are initialized to zero 


1. Application 
state inquires 


2. Application 
state changes 
from user events 


DragHeroBallTo (mousePosition) 

// Refine radius and velocity of hero ball based on 
//hero’s center and current mousePosition 
SetHeroBallVelocity (velocityX, velocityY) 

// Sets current hero ball velocity 
// if there is no current hero ball, nothing happens 
InsertHeroToAllWorld() 

// Done defining HeroBall, insert into WorldBallSet 


y 

3. Application 
state changes 
from application 
events 


SelectHeroBall (mousePosition) 

// Sets hero ball to be the one currently under 
// mousePosition sets to null if none exists 
UpdateSimulation() 

// Move balls by their velocities, update velocity 
// by gravity and remove off-screen ones 
RedrawApplicationState() 

//Draw all the freefalling balls (including the HeroBall)' 
// to the desired region on the application window. 


} 

ai^- 


4. Application 
state visualization 


Figure 19.11. The model component of the ball shooting program. 


1 . Application state inquiries. These are functions that return the applica¬ 
tion state. These functions are important for maintaining up-to-date GUI 
elements (e.g., status echo or velocity slider bars). 

2. Application state changes from user events. These are functions that 
change the application state according to a user's input actions. Notice that 
the function names should reflect the functionality (e.g., CreateHeroBall) 
and not the user event actions (e.g., ServiceLMBDown). It is common for 
a group of functions to support a defined finite state transition. For exam- 







492 


19. Building Interactive Graphics Applications 


pie, CreateHeroBall, DragHeroBall, and InsertHeroToWorld implement the 
finite state diagram of Figure 19.9. 

3. Application state changes from application (timer) events. This is a 
function that updates the application state resulting from purposeful and 
usually synchronous application timer events. For the ball shooting pro¬ 
gram, we update all of the velocities, displace the balls’ positions by the 
updated velocities and compute ball-to-ball collisions, as well as remove 
off-screen balls. 

4. Application state visualization. This is a function that knows how to draw 
the application state (e.g., drawing the necessary number of circles at the 
corresponding positions). It is expected that a view component will initial¬ 
ize appropriate regions on the application window, set up transformations, 
and invoke this function to draw into the initialized region. 

It is important to recognize that the user’s asynchronous events are arriving in 
between synchronous application timer events. In practice, a user observes an 
instantaneous application state (the graphics in the application window) and gen¬ 
erates asynchronous events to alter the application state for the next round of 
simulation. For example, a user sees that the HeroBall is about to collide with 
another ball and decides to change the HeroBall’s velocity to avoid the collision 
that would have happened in the next round of simulation. This means, before 
synchronous timer update, we must ensure all existing asynchronous user events 
are processed. In addition, the application should provide continuous feedback to 
ensure that users are observing an up-to-date application state. This subtle han¬ 
dling of event arrival and processing order is not an issue for simple, single-user 
applications like our ball shooting program. On large scale multi-user networked 
interactive systems, where input event and output display latencies may be signif¬ 
icant, the UpdateSimulationQ function is often divided into pre-update, update, 
and post-update. 

The View 

Figure 19.12 shows the Application View class supporting the two main func¬ 
tionalities of a view component: coordinate space transformation and initializa¬ 
tion for redraw. As discussed earlier, the controller is responsible for calling the 
DeviceToWorldXfonni) to communicate user input points to the model compo¬ 
nent. The viewport class is introduced to encapsulate the highly API-dependent 
device initialization and transformation procedures. 
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class Viewport { 


private: 


// An area on application window for drawing. 

// Actual implemenation of the viewport is GraphicsAPI dependent. 

public: 


void 

Erase Vie wport() 


//Erase the area on the application window 

void 

Acti vate V iewportF orDrawing() 

i 

class ApplicationView { 

//All subsequent Graphics API draw commands 
// will show up on this viewport 

private: 


// a view’s 

private state information 

Viewport 

T argetDra w Area 


// An area of the application main window that 
// this view will be drawing to 

public: 


void 

DeviceToWorldXform( inputDevicePoint, outputModelPoint) 


// transform the input device coordinate point to 

//outputpoint in a coordinate system that the model understands 

void 

DrawView( ApplicationModel TheModel) 

i 

//Erase and activate the TargetDrawArea and then 
// Sets up transformation for TheModel 

// calls TheModel.DrawApplicationStateQ to draw all the balls. 


Figure 19.12. The view component of the ball shooting program. 


The Controllers 

We can improve the solution of Figure 19.8 to better support the specified func¬ 
tionality of the ball shooting program. Recall that the application window de¬ 
picted in Figure 19.2 has two distinct regions for interpreting events: the upper 
application drawing area where mouse button events are associated with defin¬ 
ing/selecting the HeroBall and the lower GUI element area where mouse button 
events on the GUI elements have different meanings (e.g., mouse button events 
on the slider bars generate SliderBarChange events, etc.). We also notice that the 
upper application drawing area is the exact same area where the Application View 
must direct the drawings of the ApplicationModel state. 

Figure 19.13 introduces two types of controller classes: a ViewController and 
a GenericController. Each controller class is dedicated to receiving input events 
from the corresponding region on the application window. The ViewController 
creates an Application View during initialization such that the view can be tightly 
paired for drawing of the ApplicationModel state in the same area. In addition, 
the ViewController class also defines the appropriate mouse event service routines 
to support the interaction with the HeroBall. The GenericController is meant to 
contain GUI elements for interacting with the application state. 
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class ViewController { 

private: 

ApplicationModel TheModel = null //Reference to the application state 

ApplicationView The View = null // for drawing to the desirable region 

public: 

void InitializeController(ApplicationMode aModel, anArea ) { 

// Define and initialize the Application State 

TheModel = aModel An area on the 

TheView = new ApplicationView( anArea ) application 

window 

//Register Event Service Routines 

GUISystem::RegisterServiceRoutme(GUISystem:: LMBDown, LMBDownRoutine) 
GUISystem::RegisterServiceRoutine(GUISystem:: LMBDrag, LMBDragRoutine) 
GUISystem::RegisterServiceRo\itine(GUISystem:: LMBUp, LMBUpRoutine) 
GUISystem::RegisterServiceRoutme(GUISystem:: RMBDown, RMBDownRoutine) 
GUISystem::RegisterServiceRo\itine(GUISystem:: RedrawEvent, RedrawRoutine) 

// Event Service Routines 

// ... define the 5 event routines similar to the ones in Figure 8 ... 

} 

class GenericController { 

private: 

ApplicationModel TheModel = null //Reference to the application state 

public: 

Controller with void InitializeController(ApplicationModel aModel, anArea) { 

no View TheModel = aModel 


Controller with a 
View and 
Application State 

Creates a new 
View for the 
specified area 


// Register Event Service Routines 

GUISystem::RegisterServiceRo\itme(GUISystem:: SliderBar, SliderBarRoutine) 
GUISystem::DefmeTimerPeriod(SimulationUpdateInterval) 
GUISystem::RegisterServiceRout'me(GUISystem:: TimerEvent, ServiceTimer) 


// Event Service Routines 

// ... define the 2 event routines similar to the ones in Figure 8 ... 


// _ 

// GUI API: MainEventLoop will call this function to initialize our applicaiton 

SystemInitiaIization() { 

ApplicationModel aModel = new ApplicationModel(); 

ViewController aViewController = new ViewController() 
GenericController aGenericController = new GenericControllerQ 


Application 

initialization 


> 


aViewController.InitializeController(aModel, drawingAreaOfWindow) 
aGenericController.InitializeController(aModel, uiAreaOfWindow) 


Figure 19.13. The controller component of the ball shooting program. 


The bottom of Figure 19.13 illustrates that the GUI API MainEventLoop will 
still call the SystemInitialization () function to initialize the application. In this 
case, we create one instance each of ViewController and GenericController. The 
ViewController is initialized to monitor mouse button events in the drawing area 
of the application window (e.g., LMB click to define HeroBall), while the Gener¬ 
icController is initialized to monitor the GUI element state changes (e.g., LMB 
dragging of a slider bar). Notice that the service of the timer event is global to the 
entire application and should be defined in only one of the controllers (either one 
will do). 

In practice, the GUI API MainEventLoop dispatches events to the controllers 
based on the context of the event. The context of an event is typically defined by 
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the location of the mouse pointer or the current focus of the GUI element (i.e., 
which element is active). The application is responsible for creating a controller 
for any region on the window that it will receive events directly from the GUI 
API. 


19.3.3 Using the MVC to Expand the Ball Shooting Program 

One interesting characteristic of the MVC solution presented in Section 19.3.2 
is that the model component does not have any knowledge of the view or the 
controller components. This clean interface allows us to expand our solution by 
inserting additional view/controller pairs. 

For example. Figure 19.14 shows an extension to the ball shooting program 
given in Figure 19.2. It has an additional small view in the UI (user interface) area 
next to the quit button. The small view is exactly the same as the original large 
view, except that it covers a smaller area on the application window. 

Figure 19.15 shows that, with our MVC solution design, we can implement 
the small view by creating a new instance of ViewController (an additional Ap- 
plicaitonView will be created by the ViewController) for the desired application 
window area. Notice that the GenericController’s window area actually contains 
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// _ 

// GUI API: MainEventLoop will call this function to initialize our applicaiton 
SystemInitialization() { New instance of 

ApplicationModel aModel = new ApplicationModel(); ViewController (and 

ViewController aLargeViewController = new ViewController() ApplicationView) 

GenericController aGenericController = new GenericControllerQ 


aLargeViewController.InitializeController(aModel, drawingAreaOfWindow) 
aGenericController.InitializeController(aModel, uiAreaOfWindow) 


ViewController aSmallViewController = new ViewControllerQ 
aSmallViewController.InitializeControllerfaModel, smallViewDrawingArea) 


} 


Figure 19.15. Implementing the small view for the ball shooting program. 


the area of the small ViewController. When a user event is triggered in this area, 
the “top-layer” controller (the visible one) will receive the event. After the initial¬ 
ization, the new small view will behave in exactly the same manner as the original 
large view. 

For simplicity. Figure 19.14 shows two identical view/controller pairs. In gen¬ 
eral, a new view/controller pair is created to present a different visualization of the 
application state. For example, with slight modifications to the view component’s 
transformation functionality, the large view of Figure 19.14 can be configured 
into a zoom view and the small view can be configured into a work view, where 
the zoom view can zoom into different regions (e.g., around the HeroBall) and the 
work view can present the entire application space (e.g., all the free falling balls). 

Figure 19.16 shows the components of the solution in Figure 19.15 and how 
these components interact. We see that the model component supports the op¬ 
erations of all the view and controller components and yet it does not have any 
knowledge of these components. This distinct and simple interface has the fol¬ 
lowing advantages: 

1 . simplicity. The model component is the core of the application and usually 
is the most complicated component. By keeping the design of this compo¬ 
nent independent from any particular controller (user input/events) or view 
(specific drawing area), we can avoid unnecessary complexity. 

2. portability. The controller component typically performs the translation 
of user actions to model-specific function calls. The implementation of this 
translation is usually simple and specific to the underlying GUI API. Keep¬ 
ing the model clean from the highly API-dependent controller facilitates 
portability of a solution to other GUI platforms. 

3. expandability. The model component supports changing of its internal 
state and understands how to draw its contents. As we have seen (Fig- 
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Figure 19.16. Components of the ball shooting program with small view. 


ures 19.15 and 19.16), this means that it is straightforward to add new 
view/controller pairs to increase the interactivity of the application. 


19.3.4 Interaction among the MVC Components 

The MVC framework is a tool for describing general interactive systems. One 
of the beauties of the framework is that it is straightforward to support multiple 
view/controller pairs. Each view/controller pair shares responsibilities in exactly 
the same way: the view presents the model and the controller allows the events 
(user-generated or otherwise) to change the model component. 

For an application with multiple view/controller pairs, like the one depicted 
in Figure 19.16, we see that a user can change the model component via any of 
the three controllers. In addition, the application itself is also capable of changing 
the model state. All components must however, ensure that a coherent and up- 
to-date presentation is maintained for the user. For example, when a user drags 
out a new HeroBall, both the large and small view components must display the 
dragging of the ball, while the GenericController component must ensure that the 
slider bars properly echo the implicitly defined HeroBall velocity. In the classical 
MVC model, the coherency among different components is maintained with an 
elaborate protocol (e.g., via the observer design pattern). Although the classical 
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MVC model works very well, the elaborate protocol requires that all components 
communicate or otherwise to keep track of changes in the model component. 

In our case, and in the case of most modern interactive graphics systems, 
the application defines the timer event for simulation computation. To support 
smooth simulation results, we have seen that the timer event typically triggers 
within real-time response thresholds (e.g., 20-50 milliseconds). When servicing 
the timer events, our application can take the opportunity to maintain coherent 
states among all components. For example, in the ServiceTimerQ function in 
Figure 19.8, we update the velocity slider bars based on current FleroBall veloc¬ 
ity. In effect, during each timer event service, the application pushes the up-to- 
date model information to all components and forces the components to refresh 
their presentation for the user. In this way, the communication protocol among 
the components becomes trivial. All components keep a reference to the model, 
and each view/controller pair in the application does not need to be aware of 
the existence of other view/controller pairs. In between periodic timer events, the 
user's asynchronous events change the model. These changes are only made in the 
model component, and no other components in the application need to be aware 
of the changes. During the periodic timer service, besides computing the model’s 
simulation update, all components poll the model for up-to-date state information. 
For example, when the user clicks and drags with the left mouse button pressed, a 
new HeroBall will be defined in the model component. During this time, the large 
and small view components will not display the new HeroBall, and the velocity 
slider bars will not show the new HeroBalTs velocity. These components will get 
and display up-to-date HeroBall information only during the application timer 
event servicing. Since the timer event is triggered more than 30 times per second, 
the user will observe a smooth and up-to-date application state in all components 
at all times. 


19.3.5 Applying the MVC Concept 

The MVC framework is applicable to general interactive systems. As we have 
seen in this section, interactive systems with the MVC framework result in clearly 
defined component behaviors. In addition, with clearly defined interfaces among 
the components, it becomes straightforward to expand the system with additional 
view/controller pairs. 

An interactive system does not need to be an elaborate software application. 
For example, the slider bar is a fully functional interactive system. The model 
component contains a current value (typically a floating point number), the view 
component presents this value to the user, and the controller allows the user to in- 
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teractively change this value. A typical view component draws rectangular icons 
(bar and knobs) representing the current value in the model component, while 
the controller component typically supports mouse down and drag events to in¬ 
teractively change the value in the model component. With this understanding, 
it becomes straightforward to expand the system with additional view/controller 
pairs. For example, in our ball shooting program, the slider bars have an addi¬ 
tional view component where the numeric value of the model is displayed. In 
this case, there is no complementary controller component defined for the nu¬ 
meric view; an example complementary controller would allow the user to type 
in numeric values. 

19.4 Example Implementations 

Figure 19.17 shows two implementations of the solution presented in Section 
19.3.3. The version on the left is based on OpenGL and FLTK, while the ver¬ 
sion to the right is based on D3D and MFC. In this section, we present the details 
of these two implementations. The lessons we want to learn are that (a) a proper 
MVC solution framework should be independent from any implementation and 
(b) a well designed implementation should be realizable based on and/or easily 
ported to any suitable API. 

Before examining the details of each implementation, we will develop some 
understanding for working with modern GUI and graphics APIs. 

19.4.1 Working with GUI APIs 

Building the graphical user interface (GUI) of an application involves two distinct 
steps. The first step is to design the layout of the user interface system. In this 
step, an application developer places GUI elements (e.g., buttons, slider bars, etc.) 



Figure 19.17. Ball shooting programs with OpenGL+FLTK and D3D+MFC. 
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in an area that represents the application window. The GUI elements are typically 
two-dimensional graphical artifacts (e.g., a 3D looking icon representing a slider 
bar). The goal of this first step is to arrange these graphical artifacts to achieve 
user friendliness and maximum usability (e.g., what is the best place/color/size 
for the slider bar, etc.). The second step in building a GUI for an application 
is to semantically link the GUI elements to the functionality of the application 
(e.g., update HeroBall velocity when the slider bar is dragged). In this step, an 
application developer builds the code for the necessary functionality (e.g., code 
for changing HeroBall velocity) and registers this code with the on-screen graph¬ 
ical artifacts (e.g., the slider bar). This is precisely the event service registration 
described in Section 19.2.2. 

Modern GUI APIs support the building of a graphical user interface with a 
GUI builder. A GUI builder is an interactive graphical editor that allows its user 
to interactively place and manipulate the appearances of GUI elements. In ad¬ 
dition, the GUI builder assists the application developer to compose or generate 
service routines and links those service routines to the events generated by the 
GUI elements. 

Figure 19.18 illustrates the mechanism by which the GUI builder (in the mid¬ 
dle of the figure) links the graphical user interface front-end (left side of the the 
figure) to the user-developed program code (right side of the figure). The pat¬ 
terned ellipse, the GUI Builder, is shown in the middle of Figure 19.18 The arrow 
pointing left towards the application (A Simple Program ) indicates that the appli¬ 
cation developer works with the GUI builder to design the layout of the applica¬ 
tion (e.g., where to place the button or the status echo area). The arrows pointing 
from the GUI builder toward the MainEventLoop and Event Service Linkage mod- 



Figure 19.18. Working with a GUI API. 
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ules indicate that the GUI builder is capable of generating programming code to 
register event services. In Figure 19.18, there are two dotted connections between 
the mouse and the button GUI element through the MainEventLoop module to the 
event service linkage and the application controller modules. These two connec¬ 
tions represent the two different mechanisms with which GUI APIs support event 
services: 

1. External Service Linkage. Some GUI builders generate extra program 
modules (e.g., in the form of source code files) with code fragments sup¬ 
plied by the application developer to semantically link the GUI elements to 
the application functionality. For example, when the “button” of “A Simple 
Program” is clicked, the GUI builder ensures that a function in the “Event 
Service Linkage” module will be called. It is the application developer’s 
responsibility to insert code fragments into this function to implement the 
required action. 

2. Internal Direct Code Modification. Some GUI builders insert linkage 
programming code directly into the application source code. For example, 
the GUI builder modifies the source code of the application’s controller 
class and inserts a new function to be called when the “button” of “A Sim¬ 
ple Program” is clicked. Notice that the GUI Builder only inserts an empty 
function; the application developer is still responsible for filling in the de¬ 
tails of this new function. 

The advantage of an external service linkage mechanism is that the GUI builder 
only has minimal knowledge of the application source code. This provides a sim¬ 
ple and flexible development environment where the developer is free to organize 
the source code structure, variable names, etc., in any appropriate way. However, 
the externally generated programming module implies a loosely integrated envi¬ 
ronment. For example, to modify the “button” behavior of “A Simple Program,” 
the application developer must invoke the GUI builder, modify code fragments, 
and re-generate the external program module. The Internal Direct Code Modifi¬ 
cation mechanism in contrast provides a better integrated environment where the 
GUI builder modifies the application program source code directly. However, to 
support proper “direct code modification,” the GUI builder must have intimate 
knowledge of, and often places severe constraints on, the application source code 
system (e.g., source code organization, file names, variable names, etc.). 

19.4.2 Working with Graphics APIs 

Figure 19.19 illustrates that one way to understand a modern graphics API is by 
considering the API as a functional interface to the underlying graphics hardware. 
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Figure 19.19. Working with a graphics API. 


It is convenient to consider this functional interface as consisting of two stages: 

Graphics Hardware Context (GHC) and Graphics Device Context (GDC). 

1 . Graphics Hardware Context (GHC). This stage is depicted as the vertical 
ellipse on the right of Figure 19.19. We consider the GHC as a configuration 
which wraps over the hardware video display card. An application creates a 
GHC for each unique configuration (e.g., depth of frame buffer or z-buffer, 
etc.) of the hardware video card(s). Many Graphics Device Contexts (see 
below) can be connected to each GHC to support drawing to multiple on¬ 
screen areas from the same application. 

2. Graphics Device Context (GDC). This stage is depicted as a cylindrical 
pipe in Figure 19.19. The multiple pipes in the figure illustrates that an ap¬ 
plication can create multiple GDCs to connect to the same GHC. Through 
each GDC, an application can draw to distinct areas on the application win¬ 
dow. To properly support this functionality, each GDC represents a com¬ 
plete rendering state. A rendering state encompasses all the information 
that affects the final appearance of an image. This includes primitive at¬ 
tributes, illumination parameters, coordinate transformations, etc. Exam¬ 
ples of primitive attributes are color, size, pattern, etc., while examples of 
illumination parameters include light position, light color, surface material 
properties, etc. Graphics APIs typically support coordinate transformation 
with a series of two or three matrix processors. In Figure 19.19, the “M” 
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boxes inside the GDC pipes are the matrix processors. Each matrix pro¬ 
cessor has a transformation matrix and transforms input vertices using this 
matrix. Since these processors operate in series, together they are capable 
of implementing multi-stage coordinate space transformations (e.g., object 
to world, world to eye, and eye to projected space). The application must 
load these matrix processors with appropriate matrices to implement a de¬ 
sired transformation. 

With this understanding. Figure 19.19 illustrates that to work with a graphics 
API, an application will 

(A) initialize one or more GHCs. Each GHC represents a unique configuration 
of the graphics video card(s). In typical cases, one GHC is initialized and 
configured to be shared by the entire application. 

(B) create one or more GDCs. Each GDC supports drawing to distinct areas 
on the application window. For example, an application might create a 
GDC for each view component in an application. 

(€) draw using a GDC. An application draws to a desired window area via 
the corresponding GDC. Referring to Figure 19.19, an application sets the 
rendering state (Cl) and then issues drawing commands to the GDC (C2). 
Setting of the rendering state involves setting of all relevant primitive and 
illumination attributes and computing/loading appropriate transformation 
matrices into the matrix processors. A drawing command is typically a 
series of vertex positions accompanied by instructions on how to interpret 
the vertices (e.g., two vertex positions and an instruction that these are end 
points of a line). 

In practice, modern graphics APIs are highly configurable and support many ab¬ 
stract programming modes. For example, Microsoft’s Direct3D supports a draw¬ 
ing mode where the matrix processors can be by-passed entirely (e.g., when ver¬ 
tices are pre-transformed). 


19.4.3 Implementation Details 

Figure 19.20 shows the design of our implementation for the solution presented 
in Section 19.3.3. 1 Here, the MainUIWindow object represents the entire ball 
shooting program. This object contains the GUI elements (slider bars, quit button, 

'Source code for this section can be found at http://faculty.washington.edu/ksung/fcg3/ball.tar.zip 
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Figure 19.20. Implementation of the ball shooting program with two views. 


etc.), the model (application state), and two instances of view/controller pairs (one 
each for Large View and Small View). 

OpenGL with FLTK 

Figure 19.21 shows a screen shot of Fluid, FLTK’s GUI builder, during the con¬ 
struction of the GUI for the ball shooting program. In the lower-right corner of 
Figure 19.21, we see that (A) Fluid allows an application developer to interac¬ 
tively place graphical representations of GUI elements (3D-looking icons); (B) 
is an area representing the application window. In addition (C), the application 
developer can interactively select each GUI element to define its physical appear¬ 
ances (color, shape, size, etc.). In the lower-left corner of Figure 19.21, we see 
that (D) the application developer has the option to type in program fragments 
to service events generated by the corresponding GUI element. In this case, we 
can see that the developer must type in the program fragment for handling the X 
velocity slider bar events. Notice that this program fragment is separated from 


Fluid (FLTK 
GUI Builder) 


(D): Application 
developer types in this 
code to service the X 
velocity slider bar event. 



Figure 19.21. Fluid: FLTK’s GUI Builder. 
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// Forward declaration of mouse event service routines 

void ServiceMouse(int button, int state, int x, int y); //service mouse button click 
void ServiceActiveMouse(int x, int y); //service mouse drag 

class MainUIWindow { 

Userlnterface UI; // This is Linkage Code generated by Fluid (GUI Builder) 

// This object services events geneated by GUI elements 
Model *TheModel; // The application State (Figure 11) 

FIGlutWindow *LargeView; // These are View/Controller pairs that understand graphics 

FIGlutWindow *SmallView; // outputs (GDC) and mouse events (controller) 


MainUIWindow(Model *m) { // The constructor 

TheModel = m; //Sets the model... 

LargeView = new FlGlutWindow(TheModel); // Create LargeView 

Large View->mouse = ServiceMouse; // callback Junctions for service mouse events 

Large Vie w->motion = ServiceActiveMouse; 

// Create SmallView ... exactly the same as LargeView (not shown) 
glutTimeFunc( //set up timer and services ) //Set up timer ... 


}; 


Figure 19.22. MainUIWindow based on OpenGL and FLTK. 


the rest of the program source code system and is associated with Fluid (the GUI 
builder). At the conclusion of the GUI layout design, Fluid generates new source 
code files to be included with the rest of the application development environment. 
Since these source code files are controlled and generated by the GUI builder, the 
application developer must invoke the GUI builder in order to update/maintain 
the event service routines. In this way, FLTK implements external service linkage 
as described in Section 19.4.1. In our implementation, we instruct Fluid to create 
a Userlnterface class (.h and .cpp files) for the integration with the rest of our 
application development environment. 

Figure 19.22 shows the MainUIWindow implementation with OpenGL and 
FLTK. In this case, graphics operations are performed through OpenGL and user 
interface operations are supported by FLTK. As described, the Userlnterface ob¬ 
ject in the MainUIWindow is created by Fluid for servicing GUI events. The¬ 
Model is the application state as detailed in Figure 19.11. The two FIGlutWindow 
objects are based on a predefined FLTK class designed specifically for support¬ 
ing drawing with OpenGL. The constructor of MainUIWindow shows that the 
mouse event services are registered via a callback mechanism. As discussed in 
Section 19.2.4, the FLTK (Fast Light ToolKit) is an example of a light weight 
GUI API. Here, we see examples of using callback as a registration mechanism 
for receiving user events. 

FIGlutWindow is a FLTK pre-defined Fl_Glut_Window class object (see Fig¬ 
ure 19.23) designed specifically to support drawing with OpenGL. Each instance 
of a FIGlutWindow object is a combination of a controller (e.g., to receive mouse 
events) and a Graphics Device Context (GDC). We see that the drawi) function 
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// Fl_Glut_Window is a pure virtual class supplied by FLTK specifically for supporting 
//windows with OpenGL output and for receiving mouse events. 

class FIGlutWindow : public FIGlutWindow { 

FlGlutWindow(Model *m); // Constructor 

Model *TheModel; // The application state: initialized during construction time. 

float WorldWidth, WorldHeight; // World Space Dimension 

void HardwareToWorldPoint(int hwX, int hwY, float &wcX, float &wcY); 

// Transform mouse clicks (hwX, hwY) to World Cooridnate (wcX, wcY) 

virtual void draw() { // virtual function from Fl_Glut_Window for drawing 

glClearColor( 0.8f, 0.8f, 0.95f, O.Of); 

glClear(GL_COLOR_BUFFER_BIT); // Clearing the background color 
glMatrixMode(GL_PROJECTION); //Programming the OpenGL's GL_PROJECTION 

glLoadIdentity(); // Matrix Processor to the propoer transfrotm 

gluOrtho2D(O.Of, WorldWidth, O.Of, WorldHeight); 

TheModel->DrawApplicaitonState(); //Drawing of the application state 



Figure 19.23. FIGlutWindow: OpenGL/FLTK view/controller pair. 


first sets the rendering state (e.g., clear color and matrix values), including com¬ 
puting and programming the matrix processor (e.g., GL_PROJECTION), before 
calling TheModel to re-draw the application state. 

Direct3D with MFC 

Figure 19.24 shows a screen shot of the MFC resource editor, MFC’s GUI builder, 
during the construction of the ball shooting program. Similar to Fluid (Fig¬ 
ure 19.21), in the middle of Figure 19.24, (A) we see that the resource editor 
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Figure 19.24. The MFC resource editor. 
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class MainUlWindow : public CDialog { 


Model 

*TheModel; 

// The application State (Figure 11) 

LPDIRECT3D9 

TheGHC; 

// This is the Graphics Hardware Context 

CWndD3D 

* Large View; 

// These are View/Controller pairs that understand drawing 

CWndD3D 

*SmallView; 

// 

with D3D (GDC) and UI element events (controller) 

CSliderCtrl 

XSlider, YSlider; 

// These are the GUI elements 

CStringEcho 

StatusEcho; 



void 

OnTimerO; 


// Override the Timer service function 

void 

}; 

OnHScrolK ...); 


// Override the Scroll bar service function 


Figure 19.25. MainUlWindow based on Microsoft Direct3D and MFC. 


also supports interactive designing of the GUI element layout in (B), an area rep¬ 
resenting the application window. Although the GUI builder interfaces operate 
differently, we observe that in (C), the MFC resource editor also supports the 
definition/modification of the physical appearance of GUI elements. However, 
unlike Fluid, the MFC resource editor is tightly integrated with the rest of the 
development environment. In this case, a developer can register for event ser¬ 
vices by inheriting or overriding appropriate service routines. The MFC resource 
editor automatically inserts code fragments into the application source code sys¬ 
tem. To support this functionality, the application source code organization is 
governed/shared with the GUI builder; the application developer is not entirely 
free to rename files/classes and/or to re-organize implementation source code file 
system structure. MFC implements internal direct code modification for event 
service linkage, as described in Section 19.4.1. 

Figure 19.25 shows the MainUlWindow implementation with Direct3D and 
MFC. In this implementation, graphics operations are performed through Di- 
rect3D while user interface operations are supported by MFC. Once again, The- 
Model is the application state as detailed in Figure 19.11. LPDIRECT3D9 is 
the Graphics Hardware Context (GHC) interface object. This object is created 
and initialized in the MainUlWindow constructor (not shown here). The two 
CWndD3D objects are defined to support drawing with Direct3D. We notice that 
one major difference between Figure 19.25 and Figure 19.22 is in the GUI ele¬ 
ment support. In Figure 19.25, we see that the GUI element objects (e.g., XSlider) 
and the corresponding service routines (e.g., OnHScrollO) are integrated into the 
MainUlWindow object. This is in contrast to the solution shown in Figure 19.22 
where GUI elements are grouped into a separate object (e.g., the Userlnterface 
object) with callback event service registrations. As discussed in Section 19.2.4, 
MFC is an example of a large commercial GUI API, where many event services 
are registered based on object-oriented function overrides (e.g., the OnHScrollO 
and OnTimer() functions). 
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// CWnd is the MFC base class for all window objects. Here we subclass to create a D3D output 
//window by including a D3D Graphics Device Context. 
class CWndD3D: public CWnd { 

LPDIRECT3DDEVICE9 D3DDevice; // This is the D3D Graphics Device Context (GDC) 

Model *TheModel; // The application state 

void InitD3D(LPDIRECT3D9); // Create D3DDevice (GDC) to connect to GHC 

void RedrawView() { //Draws the Application State 

// Compute world coordinate to device transform 
D3DMATRIX transform = ComputeTransformation(); 
D3DDevice->SetTransform(D3DTS_WORLD, &transform); 

// Programming the D3D WORLD matrix with the computed transform matrix 

D3DDevice->Clear( bgColor, D3DCLEARTARGET); 

D3DDevice->BeginScene(); 

TheModel->DrawApplicationState(D3DDevice); 

D3DDevice->EndScene(); 

D3 DDevice->Present(); 

t 

void HardwareToWorldPoint(CPoint hwPt, float &wcX, float &wcY); 

// Transform mouse clicks (hwPt) to world coordinate (wcX, wcY) 

void OnLButtonDown(CPoint hwPt); // Override mouse button/drag service functions 


Figure 19.26. CWndD3D: Direct3D/MFC view/controller pair. 


CWndD3D is a sub-class of the MFC CWnd class (see Figure 19.26). CWnd 
is the base class designed for a generic MFC window. By sub-classing from this 
base class, CWndD3D can support all default window-related events (e.g., mouse 
events). The LPDIRECT3DDEVICE9 object is the D3D Graphics Device Con¬ 
text (GDC) interface object. The InitD3D() function creates and initializes the 
GDC object and connects this object to the LPDIRECT3D9 (GHC). In this way, 
a CWndD3D sub-class is a basic view/controller pair: it supports the view func¬ 
tionality with drawing via the D3D GDC and controller functionality with input 
via MFC. The RedrawViewi) function is similar to the draw() function of Fig¬ 
ure 19.23 where we first set up the rendering state (e.g., bgColor and matrix), 
including programming the matrix processor (e.g., D3DTS-WORLD), before call¬ 
ing the model to draw itself. 

In conclusion, we see that Figure 19.20 represents an implementation of the 
solution presented in Section 19.3.3 while Section 19.4.3 presented two versions 
of the implementation for Figure 19.20. Although the GUI Builder, event service 
registration, and actual API function calls are very different, the final program¬ 
ming source code structures are remarkably similar. In fact, the two versions 
share the exact same source code files for the Model class. In addition, although 
the drawing functions for CirclePrimitive are different for OpenGL and D3D, we 
were able to share the source code files for the rest of the primitive behaviors (e.g., 
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set center/radius, travel with velocity, collide, etc.). We reaffirm our assertion that 
software framework, solution structures, and event implementations should be 
designed independent of any APIs. 


19.5 Applying Our Results 

We have seen that the event-driven programming model is well suited for design¬ 
ing and implementing programs that interact with users. In addition, we have seen 
that the modelview-controller framework is a convenient and powerful structure 
for organizing functional modules in an interactive graphics application. In devel¬ 
oping a solution to the ball shooting program, we have demonstrated that knowl¬ 
edge from event-driven programming helps us design the controller component 
(e.g., handling of mouse events, etc.), computer graphics knowledge helps us de¬ 
sign the view component (e.g., transformation and drawing of circles, etc.), while 
the model component is highly dependent upon the specific application (e.g., free 
falling and colliding circles). Our discussion so far has been based on a very sim¬ 
ple example. We will now explore the applicability of the MVC framework and 
its implementation in real-world applications. 


19.5.1 Example 1: PowerPoint 

Figure 19.27 shows how we can apply our knowledge in analyzing and gaining 
insights into Microsoft PowerPoint, 2 a popular interactive graphics application. A 
screen shot of a slide creation session using the PowerPoint application is shown 
at the left of Figure 19.27. The right side of Figure 19.27 shows how we can 
apply the implementation framework to gain insights into the PowerPoint appli¬ 
cation. The MainUIWindow at the right of Figure 19.27 is the GUI window of 
the entire application, and it contains the GUI elements that affect/echo the entire 
application state (e.g., main menu, status area, etc.). We can consider the MainUI¬ 
Window as the module that contains TheModel component and includes the four 
view/controller pairs. 

Recall that TheModel is the state of the application and that this component 
contains all the data that the user interactively creates. In the case of PowerPoint, 
the user creates a collection of presentation slides, and thus TheModel contains all 
the information about these slides (e.g. layout design style, content of the slides, 


-Powerpoint is a registered trademark of Microsoft. 
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Figure 19.27. Understanding PowerPoint using the MVC implementation framework. 

notes associated with each slide, etc.). With this understanding of TheModel com¬ 
ponent, the rest of the application can be considered as a convenient tool for pre¬ 
senting TheModel (the view) to the user and changing TheModel (the controller) 
by the user. In this way, these convenient tools are precisely the view/controller 
pairs (e.g., ViewController components from Figure 19.16). 

In Figure 19.27, each of the four view/controller pairs (i.e., OverviewPane, 
WorkPane, StylePane, and NotesPane) presents, and supports changing of differ¬ 
ent aspects of TheModel component: 

• OverviewPane. The view component displays multiple consecutive slides 
from all the slides that the user has created; the controller component sup¬ 
ports user scrolling through all these slides and selecting one for editing. 

• WorkPane. The view component displays the details of the slide that is cur¬ 
rently being edited; the controller supports selecting and editing the content 
of this slide. 

• StylePane. The view component displays the layout design of the slide 
that is currently being edited; the controller supports selecting and defining 
a new layout design for this slide. 

• NotesPane. The view component displays the notes that the user has cre¬ 
ated for the slide that is currently being edited; the controller supports edit¬ 
ing of this notes. 

As is the case with most modern interactive applications, PowerPoint defines an 
application timer event to support user-defined animations (e.g., animated se- 
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Figure 19.28. Understanding Maya with the MVC implementation framework. 


quences between slide transitions). The coherency of the four view/controller 
pairs can be maintained during the servicing of this application timer event. For 
example, the user works with the StylePane to change the layout of the current 
slide in TheModel component. In the meantime, before servicing the next timer 
event, OverviewPane and WorkPane are not aware of the changes and display an 
out-of-date design for the current slide. During the servicing of the timer event, 
the MainUIWindow forces all view/controller pairs to poll TheModel and refresh 
their contents. As discussed in Section 19.3.4, since the timer events are typically 
triggered more than 30 times in a second, the user is not be able to detect the brief 
out-of-date display and observes a consistent display at all times. In this way, the 
four view/controller pairs only need to keep a reference to TheModel component 
and do not need to have any knowledge of each other. Thus, it is straightforward 
to insert and delete view/controller pairs into/from the application. 


19.5.2 Example 2: Maya 

We now apply our knowledge in analyzing and understanding Maya 3 , an inter¬ 
active 3D modeling/animation/rendering system. The left side of Figure 19.28 
shows a screen shot of Maya in a simple 3D content creation session. As in the 
case of Figure 19.27, the right side of Figure 19.28 shows how we can apply 
the implementation framework to gain insights into the Maya application. Once 
again we see that the MainUIWindow is the GUI window of the entire application 


' Maya is a registered trademark of Alias. 
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containing GUI elements that affect/echo the entire application state, TheModel 
component, and all the view/controller pairs. 

Since Maya is a 3D media creation system, TheModel component contains 3D 
content information (e.g. scene graph, 3D geometry, material properties, lighting, 
camera, animation, etc.). Once again, the rest of the components in the MainUI- 
Window are designed to facilitate the user’s view and to change TheModel. Here 
is the functionality of the four view/controller pairs: 

• GraphPane. The view component displays the scene graph of the 3D con¬ 
tent; the controller component supports navigating the graph and selecting 
scene nodes in the graph. 

• CameraPane. The view component renders the scene graph from a cam¬ 
era viewing position; the controller component supports manipulating the 
camera view and selecting objects in the scene. 

• MaterialPane. The view component displays all the defined materials; the 
controller component supports selecting and editing materials. 

• OutlinePane. The view component displays all the transform nodes in the 
scene; the controller component supports manipulating the transforms (e.g. 
create/change parent-child relationships, etc.). 

Once again, the coherency among the different view/controller pairs can be main¬ 
tained while servicing the application timer events. 

We do not speculate that PowerPoint or Maya is implemented according to 
our framework. These are highly sophisticated commercial applications and the 
underlying implementation is certainly much more complex. However, based on 
the knowledge we have gained from this chapter, we can begin to understand how 
to approach discussing, designing, and building such interactive graphics appli¬ 
cations. Remember that the important lesson we want to learn from this chapter 
is how to organize the functionality of an interactive graphics application into 
components and understand how the components interact so that we can better 
understand, maintain, modify, and expand an interactive graphics application. 


Notes 

I first learned about the model view controller framework and event-driven pro¬ 
gramming from SmallTalk (Goldberg & Robson, 1989) (You may also want to 
refer to the SmallTalk web site (http://www.smalltalk.org/main/).) Both Design 
Patterns—Elements of Reusable Object-Oriented Design (Gamma et al., 1995) 
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and Pattern-Oriented Software Architecture (Buschmann et al., 1996) are excel¬ 
lent sources for finding out more about design patterns and software architec¬ 
ture frameworks in general. I recommend 3D Game Engine Architecture (Eberly, 
2004) as a good source for discussions on issues relating to implementing real¬ 
time graphics systems. 1 learned MFC and Direct3D mainly by referring to the 
online Microsoft Developer Network pages (http://msdn.microsoft.com). In addi¬ 
tion, 1 find Prosise’s book Programming Windows with MFC (Prosise, 1999) to be 
very helpful. I refer to the OpenGL Programming Guide (Shreiner et al., 2004), 
Reference Manual (Shreiner, 2004), and FLTK on-line help (http://www.fltk.org/) 
when developing my OpenGF/FFTK programs. 


Exercises 

1. Here is the specification for dragging out a line: 

• Feft mouse button (FMB) clicks define the center of the line. 

• FMB drags out a line such that the line extends in two directions. The 
first direction extends from the center (FMB click) position toward the 
current mouse position. The second direction extends in the opposite 
direction from the first with exactly the same length. 

• Right mouse button (RMB) click-drag moves the line such that the 
center of the line follows the current mouse position. 

(a) Follow the steps outlined in Section 19.2.3 and design an event-driven 
programming solution for this specification. 

(b) Implement your design with FFTK and OpenGF. 

(c) Implement your design with MFC and Direct-3D. 

Notice that in this case the useful application internal state information (the 
center position of the line) and the drawing presentation requirements (end 
points of the line) do not coincide exactly. When defining the application 
state, we should pay attention to what is the most important and convenient 
information to store in order to support the specified functionality. 

2. For the line defined in Exercise 1, define a velocity that is the same as 
the slope of the line: once created, the line will travel along the direction 
defined by its slope. Use the length of the line as the speed. (Note that 
longer lines travel faster than shorter lines). 
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3. Here is the specification for dragging out a rectangle: 

• LMB click defines the center of the rectangle. 

• LMB drag out a rectangle such that the rectangle extends from the 
center position and one of the corner positions of the rectangle always 
follows the current mouse position. 

• RMB click-drag moves the rectangle such that the center of the rect¬ 
angle follows the current mouse position. 

(a) Follow the steps outlined in Section 19.2.3 and design an event-driven 
programming solution for this specification. 

(b) Implement your design with FLTK and OpenGL. 

(c) Implement your design with MFC and Direct-3D. 

4. For the rectangle in Exercise 3: 

(a) Support the definition of a velocity similar to that of HeroBall velocity 
in Section 19.1: once created, the rectangle will travel along a direc¬ 
tion that is the vector defined from its center towards the LMB release 
position. 

(b) Design and implement collision between two rectangles (this is a sim¬ 
ple 2D bound intersection check). 

5. With results from Exercise 4, we can approximate a simple Pong game: 

• The paddles are rectangles; 

• A pong-ball is drawn as a circle but we will use the bounding square (a 
square that centers at the center of the circle, with dimension defined 
by the diameter of the circle) to approximate collision with the paddle. 

Design and implement a single-player pong-game where a ball (circle) 
drops under gravitational force and the user must manipulate a paddle to 
bounce the ball upward to prevent it from dropping below the application 
window. You should: 

(a) design a specification (similar to that of Section 19.1) for this pong 
game; 

(b) follow the steps outlined in Section 19.2.3 to design an event-driven 
programming solution; 

(c) implement your design either with OpenGL or Direct-3D. 
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6. Extend the Application View in Figure 19.12 to include functionality for 
setting a world coordinate window bound. The world coordinate window 
bound defines a rectangular region in the world for displaying in the View¬ 
port. Define a method for setting the world coordinate window bound and 
modify the ApplicaitonView::DeviceToWorlclXform() function to support 
transforming mouse clicks to world coordinate space. 

7. Integrate your results from Exercise 6 into the two-view ball shooting pro¬ 
gram from Figure 19.14 such that the small view can be focused around the 
current HeroBall. When there is no current HeroBall, the small view should 
display nothing. When user LMB click-drags, or when user RMB selects a 
HeroBall, the small view’s world coordinate window bound should center 
at the HeroBall center and include a region that is 1.5 times the HeroBall 
diameter. 




In this chapter, we discuss the practical issues of measuring light, usually called 
radiometry. The terms that arise in radiometry may at first seem strange and have 
terminology and notation that may be hard to keep straight. However, because 
radiometry is so fundamental to computer graphics, it is worth studying radiome¬ 
try until it sinks in. This chapter also covers photometry, which takes radiometric 
quantities and scales them to estimate how much “useful” light is present. For 
example, a green light may seem twice as bright as a blue light of the same power 
because the eye is more sensitive to green light. Photometry attempts to quantify 
such distinctions. 


20.1 Radiometry 

Although we can define radiometric units in many systems, we use SI (Interna¬ 
tional System of Units) units. Familiar SI units include the metric units of meter 
(to) and gram (</). Light is fundamentally a propagating form of energy, so it is 
useful to define the SI unit of energy, which is the joule (J). 


20.1.1 Photons 

To aid our intuition, we will describe radiometry in terms of collections of large 
numbers of photons, and this section establishes what is meant by a photon in this 
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context. For the purposes of this chapter, a photon is a quantum of light that has 
a position, direction of propagation, and a wavelength A. Somewhat strangely, 
the SI unit used for wavelength is nanometer ( nm ). This is mainly for historical 
reasons, and 1 nm = 10“ 9 m. Another unit, the angstrom, is sometimes used, and 
one nanometer is ten angstroms. A photon also has a speed c that depends only 
on the refractive index n of the medium through which it propagates. Sometimes 
the frequency f = c /A is also used for light. This is convenient because unlike 
A and c, / does not change when the photon refracts into a medium with a new 
refractive index. Another invariant measure is the amount of energy q carried by 
a photon, which is given by the following relationship: 

he 

q = hf = —, (20.1) 

where h = 6.63 x 10" 31 J s is Plank’s Constant. Although these quantities can 
be measured in any unit system, we will use SI units whenever possible. 


20.1.2 Spectral Energy 

If we have a large collection of photons, their total energy Q can be computed 
by summing the energy qi of each photon. A reasonable question to ask is “How 
is the energy distributed across wavelengths?” An easy way to answer this is to 
partition the photons into bins, essentially histogramming them. We then have 
an energy associated with an interval. For example, we can count all the energy 
between A = 500 nm and A = 600 nm and have it turn out to be 10.2 J, and this 
might be denoted g[500, 600] = 10.2. If we divided the wavelength interval into 
two 50 nm intervals, we might find that g[500,550] = 5.2 and g[550, 600] = 5.0. 
This tells us there was a little more energy in the short wavelength half of the 
interval [500, 600]. If we divide into 25 nm bins, we might find g[500, 525] = 2.5, 
and so on. The nice thing about the system is that it is straightforward. The bad 
thing about it is that the choice of the interval size determines the number. 

A more commonly used system is to divide the energy by the size of the 
interval. So instead of < 7 [500, 600] = 10.2 we would have 

Qa[ 500, 600] = ^ = 0.12 J (nm) -1 . 

This approach is nice, because the size of the interval has much less impact on 
the overall size of the numbers. An immediate idea would be to drive the interval 
size AA to zero. This could be awkward, because for a sufficiently small AA, Q\ 
will either be zero or huge depending on whether there is a single photon or no 
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photon in the interval. There are two schools of thought to solve that dilemma. 
The first is to assume that AA is small, but not so small that the quantum nature of 
light comes into play. The second is to assume that the light is a continuum rather 
than individual photons, so a true derivative dQ/dX is appropriate. Both ways of 
thinking about it are appropriate and lead to the same computational machinery. 
In practice, it seems that most people who measure light prefer small, but finite, 
intervals, because that is what they can measure in the lab. Most people who 
do theory or computation prefer infinitesimal intervals, because that makes the 
machinery of calculus available. 

The quantity Q \ is called spectral energy, and it is an intensive quantity as op¬ 
posed to an extensive quantity such as energy, length, or mass. Intensive quantities 
can be thought of as density functions that tell the density of an extensive quantity 
at an infinitesimal point. For example, the energy Q at a specific wavelength is 
probably zero, but the spectral energy (energy density) Q\ is a meaningful quan¬ 
tity. A probably more familiar example is that the population of a country may 
be 25 million, but the population at a point in that country is meaningless. How¬ 
ever, the population density measured in people per square meter is meaningful, 
provided it is measured over large enough areas. Much like with photons, popula¬ 
tion density works best if we pretend that we can view population as a continuum 
where population density never becomes granular even when the area is small. 

We will follow the convention of graphics where spectral energy is almost al¬ 
ways used, and energy is rarely used. This results in a proliferation of A subscripts 
if “proper” notation is used. Instead, we will drop the subscript and use Q to de¬ 
note spectral energy. This can result in some confusion when people outside of 
graphics read graphics papers, so be aware of this standards issue. Your intuition 
about spectral power might be aided by imagining a measurement device with an 
energy sensor that measures light energy q. If you place a colored filter in front of 
the sensor that allows only light in the interval [A — AA/2,A + AA/2], then the 
spectral power at A is Q = Aq/ AA. 


20.1.3 Power 

It is useful to estimate a rate of energy production for light sources. This rate is 
called power, and it is measured in watts, W, which is another name for joules 
per second. This is easiest to understand in a steady state, but because power is 
an intensive quantity (a density over time), it is well defined even when energy 
production is varying over time. The units of power may be more familiar, e.g., a 
100-watt light bulb. Such bulbs draw approximately 100 J of energy each second. 
The power of the light produced will actually be less than 100 W because of 
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heat loss, etc., but we can still use this example to help understand more about 
photons. For example, we can get a feel for how many photons are produced in a 
second by a 100 W light. Suppose the average photon produced has the energy of 
a A = 500 nm photon. The frequency of such a photon is 


c 3 x 10 8 ms 1 
■' ~ A ~ 500 x IQ " 9 m 


= 6 x 10 14 s _1 . 


The energy of that photon is hf ~ 4 x 10 -19 J. That means a staggering 10 2 ° 
photons are produced each second, even if the bulb is not very efficient. This 
explains why simulating a camera with a fast shutter speed and directly simulated 
photons is an inefficient choice for producing images. 

As with energy, we are really interested in spectral power measured in 
W(nm)~ . Again, although the formal standard symbol for spectral power is 
<f>A, we will use $ with no subscript for convenience and consistency with most 
of the graphics literature. One thing to note is that the spectral power for a light 
source is usually a smaller number than the power. For example, if a light emits 
a power of 100 W evenly distributed over wavelengths 400 nm to 800 nm, then 
the spectral power will be 100 W/400 nm = 0.25 W(nm) _ . This is something to 
keep in mind if you set the spectral power of light sources by hand for debugging 
purposes. 

The measurement device for spectral energy in the last section could be mod¬ 
ified by taking a reading with a shutter that is open for a time interval At centered 
at time t. The spectral power would then be AQ/(AtAX). 


20.1.4 Irradiance 


The quantity irradiance arises naturally if you ask the question ‘‘How much light 
hits this point?” Of course the answer is “none,” and again we must use a density 
function. If the point is on a surface, it is natural to use area to define our density 
function. We modify the device from the last section to have a finite A A area 
sensor that is smaller than the light field being measured. The spectral irradiance 
H is just the power per unit area A4>/A A. Fully expanded this is 


AA At AX' 


( 20 . 2 ) 


Thus, the full units of irradiance are Jm" 2 s _ 1 (nm) _1 . Note that the SI units for 
radiance include inverse-meter-squared for area and inverse-nanometer for wave¬ 
length. This seeming inconsistency (using both nanometer and meter) arises be¬ 
cause of the natural units for area and visible light wavelengths. 
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When the light is leaving a surface, e.g., when it is reflected, the same quantity 
as irradiance is called radiant exitance, E. It is useful to have different words 
for incident and exitant light, because the same point has potentially different 
irradiance and radiant exitance. 


20.1.5 Radiance 


Although irradiance tells us how much light is arriving at a point, it tells us little 
about the direction that light comes from. To measure something analogous to 
what we see with our eyes, we need to be able to associate “how much light” with 
a specific direction. We can imagine a simple device to measure such a quantity 
(Figure 20.1). We use a small irradiance meter and add a conical “baffler” which 
limits light hitting the counter to a range of angles with solid angle A ct. The 
response of the detector is as follows: 


not counted 


"counted 


'V 

Ao 


AH 

response = —— 

= Aq 
~ A A Act At AA ' 


Figure 20.1. By adding 
a blinder that shows only 
a small solid angle Act to 
the irradiance detector, we 
measure radiance. 


This is the spectral radiance of light travelling in space. Again, we will drop the 
“spectral” in our discussion and assume that it is implicit. 



Figure 20.2. The signal a radiance detector receives does not depend on the distance to 
the surface being measured. This figure assumes the detectors are pointing at areas on the 
surface that are emitting light in the same way. 
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AA/COS0 


Figure 20.3. The ir- 

radiance at the surface as 
masked by the cone is 
smaller than that measured 
at the detector by a cosine 
factor. 



Figure 20.4. The direction 
k has a differential solid an¬ 
gle dcr associated with it. 


Radiance is what we are usually computing in graphics programs. A won¬ 
derful property of radiance is that it does not vary along a line in space. To see 
why this is true, examine the two radiance detectors both looking at a surface 
as shown in Figure 20.2. Assume the lines the detectors are looking along are 
close enough together that the surface is emitting/reflecting light “the same” in 
both of the areas being measured. Because the area of the surface being sampled 
is proportional to squared distance, and because the light reaching the detector is 
inversely proportional to squared distance, the two detectors should have the same 
reading. 

It is useful to measure the radiance hitting a surface. We can think of placing 
the cone baffler from the radiance detector at a point on the surface and measur¬ 
ing the irradiance H on the surface originating from directions within the cone 
(Figure 20.3). Note that the surface “detector” is not aligned with the cone. For 
this reason we need to add a cosine correction term to our definition of radiance: 


response = 


AH 

Act cos 6 


_A?_ 

AAcosd Act At A A 


As with irradiance and radiant exitance, it is useful to distinguish between radi¬ 
ance incident at a point on a surface and exitant from that point. Terms for these 
concepts sometimes used in the graphics literature are surface radiance L s for 
the radiance of (leaving) a surface, and field radiance L f for the radiance incident 
at a surface. Both require the cosine term, because they both correspond to the 
configuration in Figure 20.3: 


L s = 

L f = 


A E 

Act cos 6 

AH 

Act cos 6 


Radiance and Other Radiometric Quantities 


If we have a surface whose field radiance is Lf, then we can derive all of the 
other radiometric quantities from it. This is one reason radiance is considered the 
“fundamental” radiometric quantity. For example, the irradiance can be expressed 
as 


H= Lf( k) cos 9 da. 

J all k 

This formula has several notational conventions that are common in graphics 
that make such formulae opaque to readers not familiar with them (Figure 20.4). 
First, k is an incident direction and can be thought of as a unit vector, a direction, 
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or a ( 9 , 4>) pair in spherical coordinates with respect to the surface normal. The 
direction has a differential solid angle da associated with it. The held radiance is 
potentially different for every direction, so we write it as a function L( k) . 

As an example, we can compute the irradiance H at a surface that has con¬ 
stant held radiance Lf in all directions. To integrate, we use a classic spherical 
coordinate system and recall that the differential solid angle is 

da = sin 9 d6 d(f>, 


so the irradiance is 

H = f f Lf cos 9 sin 9 d9 dcj) 

J 0 J9= 0 

= t rL f . 

This relation shows us our first occurrence of a potentially surprising constant 7r. 
These factors of 7r occur frequently in radiometry and are an artifact of how we 
chose to measure solid angles, i.e., the area of a unit sphere is a multiple of 7r 
rather than a multiple of one. 

Similarly, we can find the power hitting a surface by integrating the irradiance 
across the surface area: 

T> = [ H{x)dA, 

J all x 

where x is a point on the surface, and dA is the differential area associated with 
that point. Note that we don’t have special terms or symbols for incoming ver¬ 
sus outgoing power. That distinction does not seem to come up enough to have 
encouraged the distinction. 


20.1.6 BRDF 

Because we are interested in surface appearance, we would like to characterize 
how a surface reflects light. At an intuitive level, for any incident light coming 
from direction k, , there is some fraction scattered in a small solid angle near the 
outgoing direction k c . There are many ways we could formalize such a concept, 
and not surprisingly, the standard way to do so is inspired by building a simple 
measurement device. Such a device is shown in Figure 20.5, where a small light 
source is positioned in direction k, as seen from a point on a surface, and a detec¬ 
tor is placed in direction k Q . For every directional pair (k,, k Q ), we take a reading 
with the detector. 

Now we just have to decide how to measure the strength of the light source 
and make our reflection function independent of this strength. For example, if we 
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Figure 20.5. A simple measurement device for directional reflectance. The positions of light 
and detector are moved to each possible pair of directions. Note that both k, and k 0 point 
away from the surface to allow reciprocity. 


replaced the light with a brighter light, we would not want to think of the surface 
as reflecting light differently. We could place a radiance meter at the point being 
illuminated to measure the light. However, for this to get an accurate reading that 
would not depend on the A a of the detector, we would need the light to subtend a 
solid angle bigger than Act. Unfortunately, the measurement taken by our roving 
radiance detector in direction k Q will also count light that comes from points 
outside the new detector’s cone. So this does not seem like a practical solution. 

Alternatively, we can place an irradiance meter at the point on the surface be¬ 
ing measured. This will take a reading that does not depend strongly on subtleties 
of the light source geometry. This suggests characterizing reflectance as a ratio: 

L s 

P= H ’ 

where this fraction p will vary with incident and exitant directions k, and k Q , H 
is the irradiance for light position k, , and L s is the surface radiance measured in 
direction k c . If we take such a measurement for all direction pairs, we end up 
with a 4D function pi'k, . k Q ). This function is called the bidirectional reflectance 
distribution function (BRDF). The BRDF is all we need to know to characterize 
the directional properties of how a surface reflects light. 

Directional Hemispherical Reflectance 

Given a BRDF it is straightforward to ask “What fraction of incident light is 
reflected?” However, the answer is not so easy; the fraction reflected depends on 
the directional distribution of incoming light. For this reason, we typically only 
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set a fraction reflected for a fixed incident direction k,. This fraction is called the 
directional hemispherical reflectance. This fraction, R(k .;) is defined by 


R( ki) 


power in all outgoing directions k G 
power in a beam from direction k^ 


Note that this quantity is between zero and one for reasons of energy conservation. 
If we allow the incident power <1», to hit on a small area A A, then the irradiance 
is Tj/A/l. Also, the ratio of the incoming power is just the ratio of the radiance 
exitance to irradiance: 

m) = §■ 

The radiance in a particular direction resulting from this power is by the definition 
ofBRDF: 


L(k 0 ) = Hp(ki,k 0 ) 
~ AA' 


And from the definition of radiance, we also have 


L(k 0 ) 


A E 

Acr 0 cos 0 o ’ 


where E is the radiant exitance of the small patch in direction k Q . Using these 
two definitions for radiance we get 


Hp(ki,k 0 ) 


A E 

Aa 0 cos 0 o ' 


Rearranging terms, we get 

A E 

—— = p(ki, k Q ) Acr 0 cos 0 o . 

1 1 

This is just the small contribution to E/H that is reflected near the particular k Q . 
To find the total i?(kj), we sum over all outgoing k c . In integral form this is 


R( ki) 



pik-i, k 0 ) cos do dcr 0 


Ideal Diffuse BRDF 

An idealized diffuse surface is called Lambertian. Such surfaces are impossible in 
nature for thermodynamic reasons, but mathematically they do conserve energy. 
The Lambertian BRDF has p equal to a constant for all angles. This means the 
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surface will have the same radiance for all viewing angles, and this radiance will 
be proportional to the irradiance. 

If we compute R(ki) for a a Lambertian surface with p = C we get 


RQU) = [ 

J al 


C COS 6 0 d(T 0 

all k Q 

2i r pit / 2 

/ A: cos 0 O sin 0 O d0 o d(j) 0 
. O =o Jo „=o 


= 7rC. 


Thus, for a perfectly reflecting Lambertian surface (R = 1), we have p = 1 /7r, 
and for a Lambertian surface where f?(k;) = r, we have 

r 

p(kj,k 0 ) = 

7r 

This is another example where the use of a steradian for the solid angle determines 
the normalizing constant and thus introduces factors of 7r. 


20.2 Transport Equation 

With the definition of BRDF, we can describe the radiance of a surface in terms of 
the incoming radiance from all different directions. Because in computer graphics 
we can use idealized mathematics that might be impractical to instantiate in the 
lab, we can also write the BRDF in terms of radiance only. If we take a small part 
of the light with solid angle A<x; with radiance L, and “measure” the reflected 
radiance in direction k c due to this small piece of the light, we can compute 
a BRDF (Figure 20.6). The irradiance due to the small piece of light is H = 


kj 



Figure 20.6. The geometry for the transport equation in its directional form. 
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Li cos OiAtJi. Thus the BRDF is 

= Lo 
Li cos diAai 

This form can be useful in some situations. Rearranging terms, we can write down 
the part of the radiance that is due to light coming from direction k.,: 


AL 0 = p(k l ,k 0 )L i cos^Actj. 

If there is light coming from many directions L,(k.;), we can sum all of them. In 
integral form, with notation for surface and held radiance, this is 


L s (k 0 ) = / p(k,,k 0 )Lf(ki)cos6id(Ji. 

J all ki 

This is often called the rendering equation in computer graphics (Immel et al., 
1986). 

Sometimes it is useful to write the transport equation in terms of surface radi¬ 
ances only (Kajiya, 1986). Note, that in a closed environment, the held radiance 
Lf (k,) comes from some surface with surface radiance L s (— k,) = Lf(k,) (Fig¬ 
ure 20.7). The solid angle subtended by the point x' in the hgure is given by 


A (Ti = 


AA' cos O' 
l|x — x'|| 2 ’ 


where A A 1 the the area we associate with x'. Substituting for Arr, in terms of 
A A' suggests the following transport equation: 

p{k, k Q )£ s (x / , x - x') cos 9j cos O' , 

x’ visible to x 11 x x 11 ~ 

Note that we are using a non-normalized vector x — x' to indicate the direction 
from x' to x. Also note that we are writing L s as a function of position and 
direction. 

The only problem with this new transport equation is that the domain of inte¬ 
gration is awkward. If we introduce a visibility function, we can trade off com¬ 
plexity in the domain with complexity in the integrand: 


L s (x, k c ) = / 

J all 


L s (x,k 0 ) 


f p(ki, k 0 )L s (x.\ x - x')i;(x, x') cos Qi cos 9' , 

Lx- I|X-X'P 


where 


w(x,x') 


1 if x and x’ are mutually visible, 
0 otherwise. 



Figure 20.7. The 

light coming into one point 
comes from another point. 
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20.3 Photometry 



380 555 800' X 


Figure 20.8. The lu¬ 
minous efficiency function 
versus wavelength (nm). 


For every spectral radiometric quantity there is a related photometric quantity 
that measures how much of that quantity is “useful” to a human observer. Given 
a spectral radiometric quantity f r ( A), the related photometric quantity f p is 



where y is the luminous efficiency function of the human visual system. This 
function is zero outside the limits of integration above, so the limits could be 
0 and oo and f p would not change. The luminous efficiency function will be 
discussed in more detail in Chapter 21, but we discuss its general properties here. 
The leading constant is to make the definition consistent with historical absolute 
photometric quantities. 

The luminous efficiency function is not equally sensitive to all wavelengths 
(Figure 20.8). For wavelengths below 380 nm (the ultraviolet range), the light is 
not visible to humans and thus has a y value of zero. From 380 nm it gradually 
increases until A = 555 nm where it peaks. This is a pure green light. Then, it 
gradually decreases until it reaches the boundary of the infrared region at 800 nm. 

The photometric quantity that is most commonly used in graphics is lumi¬ 
nance, the photometric analog of radiance: 



y(A)L(A) dX. 


Y = 


A—380 nm 


The symbol Y for luminance comes from colorimetry. Most other fields use the 
symbol L; we will not follow that convention because it is too confusing to use L 
for both luminance and spectral radiance. Luminance gives one a general idea of 
how “bright” something is independent of the adaptation of the viewer. Note that 
the black paper under noonday sun is subjectively darker than the lower luminance 
white paper under moonlight; reading too much into luminance is dangerous, but 
it is a very useful quantity for getting a quantitative feel for relative perceivable 
light output. The unit Im stands for lumens. Note that most light bulbs are rated 
in terms of the power they consume in watts, and the useful light they produce in 
lumens. More efficient bulbs produce more of their light where y is large and thus 
produce more lumens per watt. A “perfect” light would convert all power into 
555 nm light and would produce 683 lumens per watt. The units of luminance are 
thus (lm/W) (W/(m 2 sr)) = lm/(m 2 sr). The quantity one lumen per steradian is 
defined to be one candela ( cd ), so luminance is usually described in units cd/m 2 . 



380 555 800' X 


Figure 20.8. The lu¬ 
minous efficiency function 
versus wavelength (nm). 
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Frequently Asked Questions 

• What is “intensity”? 

The term intensity is used in a variety of contexts and its use varies with both era 
and discipline. In practice, it is no longer meaningful as a specific radiometric 
quantity, but it is useful for intuitive discussion. Most papers that use it do so in 
place of radiance. 

• What is “radiosity”? 

The term radiosity is used in place of radiant exitance in some fields. It is also 
sometimes used to describe world-space light transport algorithms. 


Notes 

A common radiometric quantity not described in this chapter is radiant intensity 
(/), which is the spectral power per steradian emitted from an infinitesimal point 
source. It should usually be avoided in graphics programs because point sources 
cause implementation problems. A more rigorous treatment of radiometry can 
be found in Analytic Methods for Simulated Light Transport (Arvo, 1995). The 
radiometric and photometric terms in this chapter are from the Illumination En¬ 
gineering Society’s standard that is increasingly used by all fields of science and 
engineering (American National Standard Institute, 1986). A broader discussion 
of radiometric and appearance standards can be found in Principles of Digital 
Image Synthesis (Glassner, 1995). 


Exercises 

1. For a diffuse surface with outgoing radiance L, what is the radiant exitance? 

2. What is the total power exiting a diffuse surface with an area of 4 m 2 and a 
radiance of LI 

3. If a fluorescent light and an incandescent light both consume 20 watts of 
power, why is the fluorescent light usually preferred? 



Erik Reinhard and Garrett Johnson 
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Color 


Photons are the carriers of optical information. They propagate through media 
taking on properties associated with waves. At surface boundaries they inter¬ 
act with matter, behaving more as particles. They can also be absorbed by the 
retina, where the information they carry is transcoded into electrical signals that 
are subsequently processed by the brain. It is only there that a sensation of color 
is generated. 

As a consequence, the study of color in all its guises touches upon several 
different fields: physics for the propagation of light through space; chemistry for 
its interaction with matter; neuroscience and psychology for aspects relating to 
perception and cognition of color (Reinhard et al., 2008). 

In computer graphics, we traditionally take a simplified view of how light 
propagates through space. Photons travel along straight paths until they hit a sur¬ 
face boundary and are then reflected according to a reflection function of some 
sort. A single photon will carry a certain amount of energy, which is represented 
by its wavelength. Thus, a photon will have only one wavelength. The relation¬ 
ship between its wavelength A and the amount of energy it carries (A E) is given 
by 

A A E = 1239.9, 

where A E is measured in electron volts (eV). 

In computer graphics, it is not very efficient to simulate single photons; in¬ 
stead large collections of them are simulated at the same time. If we take a very 
large number of photons, each carrying a possibly different amount of energy, 
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Wavelength (X) 


Figure 21.1. A spectrum describes how much energy is available at each wavelength A, 
here measured as relative radiant power. This specific spectrum represents average daylight. 


then together they represent a spectrum. A spectrum can be thought of as a graph 
where the number of photons is plotted against wavelength. Because two photons 
of the same wavelength carry twice as much energy as a single photon of that 
wavelength, this graph can also be seen as a plot of energy against wavelength. 
An example of a spectrum is shown in Figure 21.1. The range of wavelengths to 
which humans are sensitive is roughly between 380 and 800 nanometers (nm). 

When simulating light, it would therefore be possible to trace rays that each 
carry a spectrum. A Tenderer that accomplishes this is normally called a spectral 
renderer. From preceding chapters it should be clear that we are not normally 
going through the expense of building spectral Tenderers. Instead, we replace 
spectra with representations that typically use red, green, and blue components. 
The reason that this is possible at all has to do with human vision and will be 
discussed later in this chapter. 

Simulating light by tracing rays takes care of the physics of light, although it 
should be noted that several properties of light, including for instance polarization, 
diffraction, and interference, are not modeled in this manner. 

At surface boundaries, we normally model what happens with light by means 
of a reflectance function. These functions can be measured directly by means 
of gonioreflectometers , leading to a large amount of tabled data, which can be 
more compactly represented by various different functions. Nonetheless, these 
reflectance functions are empirical in nature, i.e., they abstract away the chemistry 
that happens when a photon is absorbed and re-emitted by an electron. Thus, 
reflectance functions are useful for modeling in computer graphics, but do not 
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offer an explanation as to why certain wavelengths of light are absorbed and others 
are reflected. We can therefore not use reflectance functions to explain why the 
light reflected off a banana has a spectral composition that appears to us as yellow. 
For that, we would have to study molecular orbital theory, a topic beyond the 
scope of this book. 

Finally, when light reaches the retina, it is transcoded into electrical signals 
that are propagated to the brain. A large part of the brain is devoted to processing 
visual signals, part of which gives rise to the sensation of color. Thus, even if 
we know the spectrum of light that is reflected off a banana, we do not know yet 
why humans associate the term “yellow” with it. Moreover, as we will find out in 
the remainder of this chapter, our perception of color is vastly more complicated 
than it would seem at first glance. It changes with illumination, varies between 
observers, and varies within an observer over time. 

In other words, the spectrum of light coming off a banana is perceived in the 
context of an environment. To predict how an observer perceives a “banana spec¬ 
trum” requires knowledge of the environment that contains the banana as well as 
the observer’s environment. In many instances, these two environments are the 
same. However, when we are displaying a photograph of a banana on a moni¬ 
tor, then these two environments will be different. As human visual perception 
depends on the environment the observer is in, it may perceive the banana in the 
photograph differently from how an observer directly looking at the banana would 
perceive it. This has a significant impact on how we should deal with color and 
illustrates the complexities associated with color. 

To emphasize the crucial role that human vision plays, we only have to look 
at the definition of color: “Color is the aspect of visual perception by which an 
observer may distinguish differences between two structure-free fields of view of 
the same size and shape, such as may be caused by differences in the spectral 
composition of the radiant energy concerned in the observation” (Wyszecki & 
Stiles, 2000). In essence, without a human observer there is no color. 

Luckily, much of what we know about color can be quantified, so that we 
can carry out computations to correct for the idiosyncrasies of human vision and 
thereby display images that will appear to observers the way the designer of those 
images intended. This chapter contains the theory and mathematics required to 
do so. 


21.1 Colorimetry 

Colorimetry is the science of color measurement and description. Since color 
is ultimately a human response, color measurement should begin with human 
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observation. The photodetectors in the human retina consist of rods and cones. 
The rods are highly sensitive and come into play in low light conditions. Under 
normal lighting conditions, the cones are operational, mediating human vision. 
There are three cone types and together they are primarily responsible for color 
vision. 

Although it may be possible to directly record the electrical output of cones 
while some visual stimulus is being presented, such a procedure would be inva¬ 
sive, while at the same time ignoring the sometimes substantial differences be¬ 
tween observers. Moreover, much of the measurement of color was developed 
well before such direct recording techniques were available. 

The alternative is to measure color by means of measuring the human re¬ 
sponse to patches of color. This leads to color matching experiments, which will 
be described later in this section. Carrying out these experiments have resulted in 
several standardized observers, which can be thought of as statistical approxima¬ 
tions of actual human observers. First, however, we need to describe some of the 
assumptions underlying the possibility of color matching, which are summarized 
by Grassmann’s laws. 

21.1.1 Grassmann’s Laws 

Given that humans have three different cone types, the experimental laws of 
color matching can be summed up as the trichromatic generalization (Wyszecki 
& Stiles, 2000), which states that any color stimulus can be matched completely 
with an additive mixture of three appropriately modulated color sources. This 
feature of color is often used in practice, for instance by televisions and monitors 
which reproduce many different colors by adding a mixture of red, green, and 
blue light for each pixel. It is also the reason that tenderers can be built using 
only three values to describe each color. 

The trichromatic generalization allows us to make color matches between any 
given stimulus and an additive mixture of three other color stimuli. Grassmann 
was the first to describe the algebraic rules to which color matching adheres. They 
are known as Grassmann's laws of additive color matching (Grassmann, 1853) 
and are given here. 

• Symmetry law. If color stimulus A matches color stimulus B, then B 
matches A. 

• Transitive law. If A matches B and B matches C, then A matches C. 

• Proportionality law. If A matches B, then aA matches aB, where a is a 
positive scale factor. 
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• Additivity law. If A matches B, C matches D, and A + C matches B + D, 
then it follows that A + D matches B + C. 

The additivity law forms the basis for color matching and colorimetry as a 
whole. 

21.1.2 Cone Responses 

Each cone type is sensitive to a range of wavelengths, spanning most of the full 
visible range. However, sensitivity to wavelengths is not evenly distributed, but 
contains a peak wavelength at which sensitivity is greatest. The location of this 
peak wavelength is different for each cone type. The three cone types are clas¬ 
sified as S, M, and L cones, where the letters stand for short, medium, and long, 
indicating where in the visible spectrum the peak sensitivity is located. 

The response of a given cone is then the magnitude of the electrical signal it 
outputs, as a function of the spectrum of wavelengths incident upon the cone. The 
cone response functions for each cone type as a function of wavelength A are then 
given by L( A), M(A), and S'(A). They are plotted in Figure 21.2. 

The actual response to a stimulus with a given spectral composition <I>('A) is 
then given for each cone type by 


L = 

[ T>(A) L( A) dX, 



M = 

[ $(A) Af(A) dX, 


J A 

S = 

J T>(A) 5(A) dX. 


This triple of integrated responses are known as tristimulus values. 
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Figure 21.2. The cone response functions for L, M, and S cones. 





536 


21. Color 


21.1.3 Color Matching Experiments 

Given that tristimulus values are created by integrating the product of two func¬ 
tions over the visible range, it is immediately clear that the human visual system 
does not act as a simple wavelength detector. Rather, our photo-receptors act as 
approximately linear integrators. As a result, it is possible to find two different 
spectral compositions, say < E > i(A) and <1>2(A), that after integration yield the same 
response (L. M. S). This phenomenon is known as metamerism, an example of 
which is shown in Figure 21.3. 

Metamerism is the key feature of human vision that allows the construction of 
color reproduction devices, including the color figures in this book and anything 
reproduced on printers, televisions, and monitors. 

Color matching experiments also rely on the principle of metamerism. Sup¬ 
pose we have three differently colored light sources, each with a dial to alter its 
intensity. We call these three light sources primaries. We should now be able to 
adjust the intensity of each in such a way that when mixed together additively, 
the resulting spectrum integrates to a tristimulus value that matches the perceived 
color of a fourth unknown light source. When we carry out such an experiment, 
we have essentially matched our primaries to an unknown color. The positions of 
our three dials are then a representation of the color of the fourth light source. 

In such an experiment, we have used Grassmann’s laws to add the three spec¬ 
tra of our primaries. We have also used metamerism, because the combined spec¬ 
trum of our three primaries is almost certainly different from the spectrum of the 



Figure 21.3. Two stimuli < f > i(A) and < T > 2 (A) leading to the same tristimulus values after 
integration. 
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fourth light source. However, the tristimulus values computed from these two 
spectra will be identical, having produced a color match. 

Note that we do not actually have to know the cone response functions to carry 
out such an experiment. As long as we use the same observer under the same 
conditions, we are able to match colors and record the positions of our dials for 
each color. However, it is quite inconvenient to have to carry out such experiments 
every time we want to measure colors. For this reason, we do want to know the 
spectral cone response functions and average those for a set of different observers 
to eliminate inter-observer variability. 


21.1.4 Standard Observers 

If we perform a color matching experiment for a large range of colors, carried out 
by a set of different observers, it is possible to generate an average color match¬ 
ing dataset. If we specifically use monochromatic light sources against which to 
match our primaries, we can repeat this experiment for all visible wavelengths. 
The resulting tristimulus values are then called spectral tristimulus values, and 
can be plotted against wavelength A, shown in Figure 21.4. 

By using a well-defined set of primary light sources, the spectral tristimulus 
values lead to three color matching functions. The Commission Internationale 
d’Eclairage (CIE) has defined three such primaries to be monochromatic light 
sources of 435.8, 546.1, and 700 nm, respectively. With these three monochro¬ 
matic light sources, all other visible wavelengths can be matched by adding differ- 



wavelength (nm) 

Figure 21.4. Spectral tristimulus values averaged over many observers. The primaries 
where monochromatic light sources with wavelengths of 435.8, 546.1, and 700 nm. 
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ent amounts of each. The amount of each required to match a given wavelength A 
is encoded in color matching functions, given by r(A), g( A), and b( A) and plotted 
in Figure 21.4. Tristimulus values associated with these color matching functions 
are termed R,G, and B. 

Given that we are adding light, and light cannot be negative, you may have 
noticed an anomaly in Figure 21.4: to create a match for some wavelengths, it 
is necessary to subtract light. Although there is no such thing as negative light, 
we can use Grassmann’s laws once more, and instead of subtracting light from 
the mixture of primaries, we can add the same amount of light to the color that is 
being matched. 

The CIE r(A), g( A), and b( A) color matching functions allow us to determine 
if a spectral distribution <l> i matches a second spectral distribution <[> > by simply 
comparing the resulting tristimulus values obtained by integrating with these color 
matching functions: 



Of course, a color match is only guaranteed if all three tristimulus values match. 

The importance of these color matching functions lies in the fact that we are 
now able to communicate and describe colors compactly by means of tristimulus 
values. For a given spectral function, the CIE color matching functions provide a 
precise way in which to calculate tristimulus values. As long as everybody uses 
the same color matching functions, it should always be possible to generate a 
match. 

If the same color matching functions are not available, then it is possible to 
transform one set of tristimulus values into a different set of tristimulus values 
appropriate for a corresponding set of primaries. The CIE has defined one such 
a transform for two specific reasons. First, in the 1930s numerical integrations 
were difficult to perform, and even more so for functions that can be both posi¬ 
tive and negative. Second, the CIE had already developed the photopic luminance 
response function, CIE V (A). It became desirable to have three integrating func¬ 
tions, of which V (A) is one and all three being positive over the visible range. 

To create a set of positive color matching functions, it is necessary to define 
imaginary primaries. In other words, to reproduce any color in the visible spec¬ 
trum, we need light sources that cannot be physically realized. The color match¬ 
ing functions that were settled upon by the CIE are named x(X), y( A), and 2 (A) 


21.1. Colorimetry 539 



400 450 500 550 600 650 700 


wavelength (nm) 

Figure 21.5. The CIE x(X), y( A), and z(X) color matching functions. 


and are shown in Figure 21.5. Note that y( A) is equal to the photopic luminance 
response function V (A) and that each of these functions is indeed positive. They 
are known as the CIE 1931 standard observer. 

The corresponding tristimulus values are termed X , Y, and Z, to avoid con¬ 
fusion with R, G, and B tristimulus values that are normally associated with real¬ 
izable primaries. The conversion from (R, G, B) tristimulus values to ( X , Y. Z) 
tristimulus values is defined by a simple 3x3 transform: 


A' 

Y 

Z 


1 

0.17697 


0.4900 0.3100 

0.17697 0.81240 
0.0000 0.0100 


0.2000 


' R ' 

0.01063 


G 

0.9900 


B 


To calculate tristimulus values, we typically directly integrate the standard ob¬ 
server color matching functions with the spectrum of interest <f> (A), rather than go 
through the CIE f(A), g( A), and b( A) color matching functions first, followed by 
the above transformation. It allows us to calculate consistent color measurements 
and also determine when two colors match each other. 


21.1.5 Chromaticity Coordinates 

Every color can be represented by a set of three tristimulus values ( X , Y. Z). We 
could define an orthogonal coordinate system with X, Y, and Z axes and plot each 
color in the resulting 3D space. This is called a color space. The spatial extent of 
the volume in which colors lie is then called the color gamut. 
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Visualizing colors in a 3D color space is fairly difficult. Moreover, the Y- 
value of any color corresponds to its luminance, by virtue of the fact that y( A) 
equals V (A). We could therefore project tristimulus values to a 2D space which 
approximates chromatic information, i.e., information which is independent of 
luminance. This projection is called a chromciticity diagram and is obtained by 
normalization while at the same time removing luminance information: 

X 

X ~X + Y + Z ’ 

Y 

V ~ X + Y + Z ’ 

Z 

Z ~ X + Y + Z' 

Given that x + y + z equals 1, the 2 -value is redundant, allowing us to plot the 
x and y chromaticities against each other in a chromaticity diagram. Although x 
and y by themselves are not sufficient to fully describe a color, we can use these 
two chromaticity coordinates and one of the three tristimulus values, traditionally 
Y , to recover the other two tristimulus values: 


V 


V 

By plotting all monochromatic (spectral) colors in a chromaticity diagram, 
we obtain a horseshoe-shaped curve. The points on this curve are called spectrum 
loci. All other colors will generate points lying inside this curve. The spectrum 
locus for the 1931 standard observer is shown in Figure 21.6. The purple line 



x 


Figure 21.6. 

Plate XXVIII.) 


The spectrum locus for the CIE 1931 standard observer. (See also 
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Figure 21.7. The chromaticity boundaries of the CIE RGB primaries at 435.8, 546.1, and 
700 nm (solid) and a typical HDTV (dashed). (See also Plate XXIX.) 


between either end of the horseshoe does not represent a monochromatic color, 
but rather a combination of short and long wavelength stimuli. 

A (non-monochromatic) primary can be integrated over all visible wave¬ 
lengths, leading to {X. Y. Z) tristimulus values, and subsequently to an (x,y) 
chromaticity coordinate, i.e., a point on a chromaticity diagram. Repeating this 
for two or more primaries yields a set of points on a chromaticity diagram that can 
be connected by straight lines. The volume spanned in this manner represents the 
range of colors that can be reproduced by the additive mixture of these primaries. 
Examples of 3-primary systems are shown in Figure 21.7. 

Chromaticity diagrams provide insight into additive color mixtures. However, 
they should be used with care. First, the interior of the horseshoe should not 
be colored, as any color reproduction system will have its own primaries and 
can only reproduce some parts of the chromaticity diagram. Second, as the CIE 
color matching functions do not represent human cone sensitivities, the distance 
between any two points on a chromaticity diagram is not a good indicator for how 
differently these colors will be perceived. 

A more uniform chromaticity diagram was developed to at least in part ad¬ 
dress the second of these problems. The CIE u'v' chromaticity diagram provides 
a perceptually more uniform spacing and is therefore generally preferred over 
(x,y) chromaticity diagrams. It is computed from (X,Y,Z) tristimulus values 
by applying a different normalization. 


X + 15Y + 3Z’ 


X + 15Y + 3Z' 


v 
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Figure 21.8. The CIE u'v' chromaticity diagram. (See also Plate XXX.) 


and can alternatively be computed directly from ( x , y) chromaticity coordinates: 


— 2x -j-12y -\- 3 
/ = 9 y 

—‘lx -\- 12 y 3 

A CIE u'v' chromaticity diagram is shown in Figure 21.8. 


21.2 Color Spaces 

As explained above, each color can be represented by three numbers, for instance 
defined by ( X , Y, Z) tristimulus values. However, its primaries are imaginary, 
meaning that it is not possible to construct a device that has three light sources 
(all positive) that can reproduce all colors in the visible spectrum. 

For the same reason, image encoding and computations on images may not 
be practical. There is, for instance, a large number of possible XYZ values that 
do not correspond to any physical color. This would lead to inefficient use of 
available bits for storage and to a higher requirement for bit-depth to preserve 
visual integrity after image processing. Although it may be possible to build a 
capture device that has primaries that are close to the CIE XYZ color matching 
functions, the cost of hardware and image processing make this an unattractive 
option. It is not possible to build a display that corresponds to CIE XYZ. For 
these reasons, it is necessary to design other color spaces: physical realizability, 
efficient encoding, perceptual uniformity, and intuitive color specification. 

The CIE XYZ color space is still actively used, mostly for the conversion 
between other color spaces. It can be seen as a device-independent color space. 
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Other color spaces can then be defined in terms of their relationship to CIE XY Z, 
which is often specified by a specific transform. For instance, linear and additive 
trichromatic display devices can be transformed to and from CIE XYZ by means 
of a simple 3x3 matrix. Some nonlinear additional transform may also be speci¬ 
fied, for instance to minimize perceptual errors when data is stored with a limited 
bit-depth, or to enable display directly on devices that have a nonlinear relation¬ 
ship between input signal and the amount of light emitted. 


21.2.1 Constructing a Matrix Transform 

For a display device with three primaries, say red, green, and blue, we can mea¬ 
sure the spectral composition of the emitted light by sending the color vectors 
(1, 0, 0), (0,1, 0), and (0, 0,1). These vectors represent the three cases namely 
where one of the primaries is full on, and the other two are off. From the measured 
spectral output, we can then compute the corresponding chromaticity coordinates 
(xr,Vr), (xg,Vg), and (xb,Vb)- 

The white point of a display is defined as the spectrum emitted when the color 
vector (1,1,1) is sent to the display. Its corresponding chromaticity coordinate is 
(xw, yw )• The three primaries and the white point characterize the display and 
are each required to construct a transformation matrix between the display’s color 
space and CIE XYZ. 

These four chromaticity coordinates can be extended to chromaticity triplets 
reconstructing the 2 -coordinate from 2 = 1—x—y, leading to triplets (xr, pr, zr) , 
( xg, Vg, zg), (xb,Vb, zb), and (xw, Uw, zw)- If we know the maximum lumi¬ 
nance of the white point, we can compute its corresponding tristimulus value 
(Xw, Yw, Zw) and then solve the following set of equations for the luminance 
ratio scalars Sr, Sq, and Sb- 

Xw = xr Sr + xg Sg + xb Sb, 

Yw = Ur Sr + ug Sq + vb Sb, 

Yw = zrSr + zg Sq + zb Sb- 

The conversion between RGB and XYZ is then given by 


'X 


xr Sr xq Sq xb Sb 


~R 

Y 

= 

Ur Sr pg Sq pb Sb 


G 

Z 


zr Sr zq Sq zb Sb 


B 


The luminance of any given color can be computed by evaluating the middle row 
of a matrix constructed in this manner: 


Y = Hr Sr R + yc Sg G + yB Sb B. 
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0.3000 
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0.3127 

y 0.3300 
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0.0600 

0.3290 


Figure 21.9. The (*, y) chromaticity coordinates for the primaries and white point specified 
by ITU-R BT.709. The sRGB standard also uses these primaries and white point. 


To convert between XYZ and RGB of a given device, the above matrix can 
simply be inverted. 

If an image is represented in an RGB color space for which the primaries and 
white point are unknown, then the next best thing is to assume that the image was 
encoded in a standard RGB color space. A reasonable choice is then to assume 
that the image was specified according to ITU-R BT.709, which is the specifica¬ 
tion used for encoding and broadcasting of HDTV. Its primaries and white point 
are specified in Table 21.9. Note that the same primaries and white point are used 
to define the well-known sRGB color space. The transformation between this 
RGB color space and CIE XYZ is and vice-versa given by 


X 


0.4124 0.3576 0.1805 


'R~ 

Y 

= 

0.2126 0.7152 0.0722 


G 

Z 


0.0193 0.1192 0.9505 


B 

'R' 


3.2405 -1.5371 -0.4985 


'X' 

G 

= 

-0.9693 1.8706 0.0416 


Y 

B 


0.0556 -0.2040 1.0572 


Z 


By substituting the maximum RGB values of the device, we can compute 
the white point. For ITU-R BT.709, the maximum values are (Rw, Gyp, Bw) = 
(100,100,100), leading to a white point of (%, Yw,Zw) = (95.05,100.00, 
108.90). 

In addition to a linear transformation, the sRGB color space is characterized 
by a subsequent nonlinear transform. The nonlinear encoding is given by 



1.055 R}I 2A - 0.055 R > 0.0031308, 

12.92 7? R < 0.0031308; 

1.055 G 1 / 2 - 4 - 0.055 G > 0.0031308, 

12.92 G G< 0.0031308; 

1.055 B 1 / 2 - 4 - 0.055 B > 0.0031308, 

12.92 B B < 0.0031308. 


This nonlinear encoding helps minimize perceptual errors due to quantization er¬ 
rors in digital applications. 
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21.2.2 Device-Dependent RGB Spaces 


As each device typically has its own set of primaries and white point, we call the 
associated RGB color spaces device-dependent. It should be noted that even if all 
these devices operate in an RGB space, they may have very different primaries 
and white points. If we therefore have an image specified in some RGB space, it 
may appear very different to us, depending upon which device we display it. 

This is clearly an undesirable situation, resulting from a lack of color man¬ 
agement. However, if the image is specified in a known RGB color space, it can 
first be converted to XYZ, which is device independent, and then subsequently it 
can be converted to the RGB space of the device on which it will be displayed. 

There are several other RGB color spaces that are well defined. They each 
consist of a linear matrix transform followed by a nonlinear transform, akin to the 
aforementioned sRGB color space. The nonlinear transform can be parameterized 
as follows: 


rtnonlinear — 

^nonlinear — 

-^nonlinear — 


(1 + f)RT-f t<R<l, 
sR 0 < R < t; 

\l + f)CT-f t < G <1, 
sG 0 < G < i; 

\l + f)Rr-f t < B < 1, 
sB 0 < B < t. 


The parameters s, /, t and 7 together with primaries and white point specify a 
class of RGB color spaces that are used in various industries. Several common 
transformations are listed in Table 21.10. 


21.2.3 LMS Cone Space 

The aforementioned cone signals can be expressed in terms of the CIE XYZ color 
space. The matrix transform to compute LMS signals from AT Z and vice-versa 
are given by 



0.38971 

0.68898 

-0.07868' 


= 

-0.22981 

1.18340 

0.04641 



0.00000 

0.00000 

1.00000 


“I 

1.91019 
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0.20195' 
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Color space 1 

sRGB 

£YZ to RGB matrix 

' 3.2405 -1.5371 -0.4985' 

-0.9693 1.8760 0.0416 

0.0556 -0.2040 1.0572 

I 

{GB to XYZ matrix 

' 0.4124 0.3576 0.1805' 

0.2126 0.7152 0.0722 

0.0193 0.1192 0.9505 

Nonlinear transform 

7 = 1/2.4 ps 0.42 

/ = 0.055 

s = 12.92 

t = 0.0031308 

Adobe RGB (1998) 

2.0414 -0.5649 -0.3447' 

-0.9693 1.8760 0.0416 

0.0134 -0.1184 1.0154. 


0.5767 0.1856 0.1882' 

0.2974 0.6273 0.0753 

0.0270 0.0707 0.9911. 

7 = 

/ = N.A. 

s = N.A. 

t = N.A. 

HDTV (HD-CIF) 

' 3.2405 -1.5371 -0.4985' 

-0.9693 1.8760 0.0416 

0.0556 -0.2040 1.0572. 


' 0.4124 0.3576 0.1805' 

0.2126 0.7152 0.0722 

0.0193 0.1192 0.9505 

7 = 0.45 

/ = 0.099 

s = 4.5 

t = 0.018 

NTSC (1953)/ 

ITU-R BT.601-4 

1.9100 -0.5325 -0.2882' 

-0.9847 1.9992 -0.0283 

0.0583 -0.1184 0.8976 


0.6069 0.1735 0.2003' 

0.2989 0.5866 0.1145 

0.0000 0.0661 1.1162 

7 = 0.45 

/ = 0.099 

s = 4.5 

t = 0.018 

PAL/SECAM 

3.0629 -1.3932 -0.4758' 

-0.9693 1.8760 0.0416 

0.0679 -0.2289 1.0694. 


0.4306 0.3415 0.1783' 

0.2220 0.7066 0.0713 

0.0202 0.1296 0.9391. 

7 = 0.45 

/ = 0.099 

s = 4.5 

t = 0.018 

SMPTE-C 

3.5054 -1.7395 -0.5440' 

-1.0691 1.9778 0.0352 

0.0563 -0.1970 1.0502 


0.3936 0.3652 0.1916' 

0.2124 0.7010 0.0865 

0.0187 0.1119 0.9582. 

7 = 0.45 

/ = 0.099 

s = 4.5 

t = 0.018 

SMPTE-240M 

' 2.042 -0.565 -0.345' 

-0.894 1.815 0.032 

0.064 -0.129 0.912. 

' 0.567 0.190 0.1931 1 _ 

0.279 0.643 0.077 * ~ ' 

L o.ooo 0.073 1.0,ej ■; : 0 ; 0228 

Wide Gamut 

' 1.4625 -0.1845 -0.2734' 

-0.5228 1.4479 0.0681 

0.0346 -0.0958 1.2875. 


' 0.7164 0.1010 0.1468' 

0.2587 0.7247 0.0166 

0.0000 0.0512 0.7740 

7 = N.A. 

/ = N.A. 

s = N.A. 

t = N.A. 


Figure 21.10. Transformations for standard RGB color spaces (after (Pascale, 2003)). 


This transform is known as the Hunt-Pointer-Estevez transform (Hunt, 2004) and 
is used in chromatic adaptation transforms as well as in color appearance model¬ 
ing. 


21.2.4 CIE 1976 L*a*b* 

Color opponent spaces are characterized by a channel representing an achromatic 
channel (luminance), as well as two channels encoding color opponency. These 
are frequently red-green and yellow-blue channels. These color opponent chan- 
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nels thus encode two chromaticities along one axis, which can have both positive 
and negative values. For instance, a red-green channel encodes red for positive 
values and green for negative values. The value zero encodes a special case: neu¬ 
tral which is neither red or green. The yellow-blue channel works in much the 
same way. 

As at least two colors are encoded on each of the two chromatic axes, it is not 
possible to encode a mixture of red and green. Neither is it possible to encode 
yellow and blue simultaneously. While this may seem a disadvantage, it is known 
that the human visual system computes similar attributes early in the visual path¬ 
way. As a result, humans are not able to perceive colors that are simultaneously 
red and green, or yellow and blue. We do not see anything resembling reddish- 
green, or yellowish-blue. We are, however, able to perceive mixtures of colors 
such as yellowish-red (orange) or greenish-blue, as these are encoded across the 
chromatic channels. 

The most relevant color opponent system for computer graphics is the CIE 
1976 L*a*b* color model. It is a perceptually more or less uniform color space, 
useful, among other things, for the computation of color differences. It is also 
known as CIELAB. 

The input to CIELAB are the stimulus (X, Y. Z) tristimulus values as well as 
the tristimulus values of a diffuse white reflecting surface that is lit by a known il- 
luminant, (X n ,Y n , Z n ). CIELAB therefore goes beyond being an ordinary color 
space, as it takes into account a patch of color in the context of a known illumina¬ 
tion. It can thus be seen as a rudimentary color appearance space. 

The three channels defined in CIELAB are L *, a *, and b*. The L* channel 
encodes the lightness of the color, i.e., the perceived reflectance of a patch with 
tristimulus value (X, Y, Z). The a* and b* are chromatic opponent channels. The 
transform between XYZ and CIELAB is given by 


L* 


0 

116 

0 

-16' 

a* 

= 

500 

-500 
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0 

b* 
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200 

-200 
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The function / is defined as 


fir) 



for r > 0.008856, 
forr < 0.008856. 


As can be seen from this formulation, the chromatic channels do depend on the 
luminance Y. Although this is perceptually accurate, it means that we cannot plot 
the values of a* and b* in a chromaticity diagram. The lightness L* is normalized 
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between 0 and 100 for black and white. Although the a* and b* channels are not 
explicitly constrained, they typically in the range [—128,128]. 

As CIELAB is approximately perceptually linear, it is possible to take two 
colors, convert them to CIELAB, and then estimate the perceived color difference 
by computing the Euclidean distance between them. This leads to the following 
color difference formula: 


A E 


* _ 

ab 


(AL*) 2 + (Aa*) 2 + (A&*) 2 


The letter E stands for difference in sensation (in German, Emphndung) (Judd, 
1932). 

Finally, the inverse transform between CIELAB and XYZ is given by 


X = X n 


r = Y n 



L* 


16 


116 116 
' 1 L* 


Z — Z n < 


7.787 116 

' L* b* 16 
116 ~~ 200 + 116 


7.787 V 116 200 


if L* > 7.9996, 
if L* < 7.9996, 

if L* > 7.9996, 
if L* < 7.9996, 

if L* > 7.9996, 
if L* < 7.9996. 


21.3 Chromatic Adaptation 

The CIELAB color space just described takes as input both a tristimulus value of 
the stimulus and the tristimulus value of light reflected off a white diffuse patch. 
As such, it forms the beginnings of a system in which the viewing environment is 
taken into account. 

The environment in which we observe objects and images has a large influence 
on how we perceive those objects. The range of viewing environments that we 
encounter in daily life is very large, from sunlight to starlight and from candlelight 
to fluorescent light. The lighting conditions not only constitute a very large range 
in the amount of light that is present, but also vary greatly in the color of the 
emitted light. 
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The human visual system accommodates these changes in the environment 
through a process called adaptation. Three different types of adaptation can be 
distinguished, namely light adaptation, dark adaptation, and chromatic adaptation. 
Light adaptation refers to the changes that occur when we move from a very dark 
to a very light environment. When this happens, at first we are dazzled by the 
light, but soon we adapt to the new situation and begin to distinguish objects in 
our environment. Dark adaptation refers to the opposite—when we go from a 
light environment to a dark environment. At first, we see very little, but after a 
given amount of time, details will start to emerge. The time needed to adapt to 
the dark is generally much longer than for light adaptation. 

Chromatic adaptation refers to our ability to adapt, and largely ignore, vari¬ 
ations in the color of the illumination. Chromatic adaptation is, in essence, the 
biological equivalent of the white balancing operation that is available on most 
modern cameras. The human visual system effectively normalizes the viewing 
conditions to present a visual experience that is fairly consistent. Thus, we ex¬ 
hibit a certain amount of color constancy: object reflectances appear relatively 
constant despite variations in illumination. 

Although we are able to largely ignore changes in viewing environment, we 
are not able to do so completely. For instance, colors appear much more col¬ 
orful on a sunny day than they do on a cloudy day. Although the appearances 
have changed, we do not assume that object reflectances themselves have actually 
changed their physical properties. We thus understand that the lighting conditions 
have influenced the overall color appearance. 

Nonetheless, color constancy does apply to chromatic content. Chromatic 
adaptation allows white objects to appear white for a large number of lighting 
conditions, as shown in Figure 21.11. 



Figure 21.11. A series of light sources plotted in the CIE u'v' chromaticity diagram. A white 
piece of paper illuminated by any of these light sources maintains a white color appearance. 
(See also Plate XXXI.) 
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Solid lines: relative cone responses 
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Figure 21.12. An example of von Kries-style independent photoreceptor gain control. The 
relative cone responses (solid line) and the relative adapted cone responses to CIE illuminant 
A (dashed) are shown. The separate patch of color represents CIE illuminant A rendered into 
the sRGB color space. (See also Plate XXXII.) 


Computational models of chromatic adaptation tend to focus on the gain con¬ 
trol mechanism in the cones. One of the simplest models assumes that each cone 
adapts independently to the energy that it absorbs. This means that different cone 
types adapt differently dependent on the spectrum of the light being absorbed. 
Such adaptation can then be modeled as an adaptive and independent rescaling of 
the cone signals: 


La — rr L, 

M a = p M, 

S a = jS, 

where (L ai M a , S a ) are the chromatically adapted cone signals, and a, /3, and 7 
are the independent gain controls which are determined by the viewing environ¬ 
ment. This type of independent adaptation is also known as von-Kries adaptation. 
An example is shown in Figure 21.12. 

The adapting illumination can be measured off a white surface in the scene. In 
the ideal case, this would be a Lambertian surface. In a digital image, the adapting 
illumination can also be approximated as the maximum tristimulus values of the 
scene. The light measured or computed in this manner is the adapting white, given 
by ( L W ,M W , S v , ). Von Kries adaptation is then simply a scaling by the reciprocal 
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of the adapting white, carried out in cone response space: 
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In many cases, we are interested in what stimulus should be generated under 
one illumination to match a given color under a different illumination. For ex¬ 
ample, if we have a colored patch illuminated by daylight, we may ask ourselves 
what tristimulus values should be generated to create a matching color patch that 
will be illuminated by incandescent light. 

We are thus interested in computing corresponding colors, which can be 
achieved by cascading two chromatic adaptation calculations. In essence, the 
above von Kries transform divides out the adapting illuminant—in our example, 
the daylight illumination. If we subsequently multiply in the incandescent il¬ 
luminant, we have computed a corresponding color. If the two illuminants are 
given by (L w>1 , M w>1 , S w> i) and (L w>2 , M Wj2 , S w , 2 ), the corresponding color 
(L c , M c , S c ) is given by 
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0 

0 

&w,2 



There are several more complicated and, therefore, more accurate chromatic 
adaptation transform in existence (Reinhard et al., 2008). However, the simple 
von Kries model remains remarkably effective in modeling chromatic adaptation 
and can thus be used to achieve white balancing in digital images. 

The importance of chromatic adaptation in the context of rendering, is that we 
have moved one step closer to taking into account the viewing environment of the 
observer, without having to correct for it by adjusting the scene and rerendering 
our imagery. Instead, we can model and render our scenes, and then, as an image 
post-process, correct for the illumination of the viewing environment. To ensure 
that white balancing does not introduce artifacts, however, it is important to ensure 
that the image is rendered to a floating-point format. If rendered to traditional 8- 
bit image formats, the chromatic adaptation transform may amplify quantization 


errors. 
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21.4 Color Appearance 

While colorimetry allows us to accurately specify and communicate color in a 
device-independent manner, and chromatic adaptation allows us to predict color 
matches across changes in illumination, these tools are still insufficient to describe 
what colors actually look like. 

To predict the actual perception of an object, we need to know more informa¬ 
tion about the environment and take that information into account. The human 
visual system is constantly adapting to its environment, which means that the per¬ 
ception of color will be strongly influenced by such changes. Color appearance 
models take into account measurements of the stimulus itself, as well as the view¬ 
ing environment. This means that the resulting description of color is independent 
of viewing condition. 

The importance of color appearance modeling can be seen in the following 
example. Consider an image being displayed on an LCD screen. When making 
a print of the same image and viewing it in a different context, more often than 
not the image will look markedly different. Color appearance models can be 
used to predict the changes required to generate an accurate cross-media color 
reproduction (Fairchild, 2005). 

Although color appearance modeling offers important tools for color repro¬ 
duction, actual implementations tend to be relatively complicated and cumber¬ 
some in practical use. It can be anticipated that this situation may change over 
time. However, until then, we leave their description to more specialized text 
books (Fairchild, 2005). 

Notes 

Of all the books on color theory, Reinhard et al.’s work (Reinhard et al., 2008) is 
most directly geared towards engineering disciplines, including computer graph¬ 
ics, computer vision, and image processing. Other general introductions to color 
theory are given by Berns (Berns, 2000) and Stone (Stone, 2003). Wyszecki and 
Stiles have produced a comprehensive volume of data and formulae, forming an 
indispensable reference work (Wyszecki & Stiles, 2000). For color reproduction, 
we recommend Hunt’s book (Hunt, 2004). Color appearance models are compre¬ 
hensively described in Fairchild’s book (Fairchild, 2005). For color issues related 
to video and HDTV Poynton’s book is essential. (Poynton, 2003). 



Visual Perception 


The ultimate purpose of computer graphics is to produce images for viewing by 
people. Thus, the success of a computer graphics system depends on how well it 
conveys relevant information to a human observer. The intrinsic complexity of the 
physical world and the limitations of display devices make it impossible to present 
a viewer with the identical patterns of light that would occur when looking at a 
natural environment. When the goal of a computer graphics system is physical 
realism, the best we can hope for is that the system be perceptually effective : 
displayed images should “look” as intended. For applications such as technical 
illustration, it is often desirable to visually highlight relevant information and 
perceptual effectiveness becomes an explicit requirement. 

Artists and illustrators have developed empirically a broad range of tools and 
techniques for effectively conveying visual information. One approach to improv¬ 
ing the perceptual effectiveness of computer graphics is to utilize these methods 
in our automated systems. A second approach builds directly on knowledge of 
the human vision system by using perceptual effectiveness as an optimization cri¬ 
teria in the design of computer graphics systems. These two approaches are not 
completely distinct. Indeed, one of the first systematic examinations of visual 
perception is found in the notebooks of Leonardo da Vinci. 

The remainder of this chapter provides a partial overview of what is known 
about visual perception in people. The emphasis is on aspects of human vision 
that are most relevant to computer graphics. The human visual system is ex¬ 
tremely complex in both its operation and its architecture. A chapter such as this 
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can at best provide a summary of key points, and it is important to avoid over 
generalizing from what is presented here. More in-depth treatments of visual per¬ 
ception can be found in Wandell (1995) and Palmer (1999); Gregory (1997) and 
Yantis (2000) provide additional useful information. A good computer vision ref¬ 
erence such as Forsyth and Ponce (2002) is also helpful. It is important to note 
that despite over 150 years of intensive research, our knowledge of many aspects 
of vision is still very limited and imperfect. 


Light: 

• travels far 

• travels fast 

• travels in straight lines 

• interacts with stuff 

• bounces off things 

• is produced in nature 

• has lots of energy 

—Steven Shafer 

Figure 22.1. The nature of 
light makes vision a power¬ 
ful sense. 


22.1 Vision Science 

Vision is generally agreed to be the most powerful of the senses in humans. 
Vision produces more useful information about the world than does hearing, 
touch, smell, or taste. This is a direct consequence of the physics of light (Fig¬ 
ure 22.1). Illumination is pervasive, especially during the day but also at night 
due to moonlight, starlight, and artificial sources. Surfaces reflect a substantial 
portion of incident illumination and do so in ways that are idiosyncratic to par¬ 
ticular materials and that are dependent on the shape of the surface. The fact 
that light (mostly) travels in straight lines through the air allows vision to acquire 
information from distant locations. 

The study of vision has a long and rich history. Much of what we know 
about the eye traces back to the work of philosophers and physicists in the 1600s. 
Starting in the mid-1800s, there was an explosion of work by perceptual psy¬ 
chologists exploring the phenomenology of vision and proposing models of how 
vision might work. The mid-1900s saw the start of modern neuroscience, which 
investigates both the fine-scale workings of individual neurons and the large-scale 
architectural organization of the brain and nervous system. A substantial portion 
of neuroscience research has focused on vision. More recently, computer science 
has contributed to the understanding of visual perception by providing tools for 
precisely describing hypothesized models of visual computations and by allow¬ 
ing empirical examination of computer vision programs. The term vision science 
was coined to refer to the multidisciplinary study of visual perception involving 
perceptual psychology, neuroscience, and computational analysis. 

Vision science views the purpose of vision as producing information about 
objects, locations, and events in the world from imaged patterns of light reach¬ 
ing the viewer. Psychologists use the term distal stimulus to refer to the physical 
world under observation and proximal stimulus to refer to the retinal image. 1 Us- 

1 In computer vision, the term scene is often used to refer to the external world, while the term 
image is used to refer to the projection of the scene onto a sensing plane. 
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ing this terminology, the function of vision is to generate a description of aspects 
of the distal stimulus given the proximal stimulus. Visual perception is said to be 
veridical when the description that is produced accurately reflects the real world. 
In practice, it makes little sense to think of these descriptions of objects, locations, 
and events in isolation. Rather, vision is better understood in the context of the 
motor and cognitive functions that it serves. 


22.2 Visual Sensitivity 

Vision systems create descriptions of the visual environment based on properties 
of the incident illumination. As a result, it is important to understand what prop¬ 
erties of incident illumination the human vision system can actually detect. One 
critical observation about the human vision system is that it is primarily sensi¬ 
tive to patterns of light rather than being sensitive to the absolute magnitude of 
light energy. The eye does not operate as a photometer. Instead, it detects spatial, 
temporal, and spectral patterns in the light imaged on the retina and information 
about these patterns of light form the basis for all of visual perception. 

There is a clear ecological utility to the vision system’s sensitivity to variations 
in illumination over space and time. Being able to accurately sense changes in the 
environment is crucial to our survival. 2 A system which measures changes in 
light energy rather than the magnitude of the energy itself also makes engineering 
sense, since it makes it easier to detect patterns of light over large ranges in light 
intensity. It is a good thing for computer graphics that vision operates in this 
manner. Display devices are physically limited in their ability to project light 
with the power and dynamic range typical of natural scenes. Graphical displays 
would not be effective if they needed to produce the identical patterns of light as 
the corresponding physical world. Fortunately, all that is required is that displays 
be able to produce similar patterns of spatial and temporal change to the real 
world. 


22.2.1 Brightness and Contrast 

In bright light, the human visual system is capable of distinguishing gratings con¬ 
sisting of high contrast parallel light and dark bars as fine as 50-60 cycles/degree. 
(In this case, a “cycle” consists of an adjacent pair of light and dark bars.) For 

2 It is sometime said that the primary goals of vision are to support eating, avoiding being eaten, 
reproduction, and avoidance of catastrophe while moving. Thinking about vision as a goal-directed 
activity is often useful, but needs to be done so at a more detailed level. 
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Figure 22.2. The contrast between stripes increases in a constant manner from top to 
bottom, yet the threshold of visibility varies with frequency. 


comparison, the best currently available LCD computer monitor, at a normal 
viewing distance, can display patterns as fine as about 20 cycles/degree. The 
minimum contrast difference at an edge detectable by the human visual system 
in bright light is about 1% of the average luminance across the edge. In most 
8-bit displays, differences of a single gray level are often noticeable over at least 
a portion of the range of intensities due to the nature of the mapping from gray 
levels to actual display luminance. 

Characterizing the ability of the visual system to detect fine scale patterns (vi¬ 
sual acuity ) and to detect changes in brightness is considerably more complicated 
than for cameras and similar image acquisition devices. As shown in Figure 22.2, 
there is an interaction between contrast and acuity in human vision. In the figure, 
the scale of the pattern decreases from left to right while the contrast increases 
from top to bottom. If you view the figure at a normal viewing distance, it will 
be clear that the lowest contrast at which a pattern is visible is a function of the 
spatial frequency of the pattern. 

There is a linear relationship between the intensity of light L reaching the eye 
from a particular surface point in the world, the intensity of light I illuminating 
that surface point, and the reflectivity R of the surface at the point being observed: 


L = al ■ R 1 


( 22 . 1 ) 
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Figure 22.3. Lightness constancy. Cast a shadow over one of the patterns with your hand 
and notice that the apparent brightness of the two center squares remains nearly the same. 


where a is dependent on the relationship between the surface geometry, the pat¬ 
tern of incident illumination, and the viewing direction. While the eye is only 
able to directly measure L, human vision is much better at estimating R than L. 
To see this, view Figure 22.3 in bright direct light. Use your hand to shadow one 
of the patterns, leaving the other directly illuminated. While the light reflected off 
of the two patterns will be significantly different, the apparent brightness of the 
two center squares will seem nearly the same. The term lightness is often used 
to describe the apparent brightness of a surface, as distinct from its actual lumi¬ 
nance. In many situations, lightness is invariant to large changes in illumination, 
a phenomenon referred to as lightness constancy. 

The mechanisms by which the human visual system achieves lightness con¬ 
stancy are not well understood. As shown in Figure 22.2, the vision system is 
relatively insensitive to slowly varying patterns of light, which may serve to dis¬ 
count the effects of slowly varying illumination. Apparent brightness is affected 
by the brightness of surrounding regions (Figure 22.4). This can aid lightness 
constancy when regions are illuminated dissimilarly. While this simultaneous 
contrast effect is often described as a modification of the perceived lightness of 



(a) (b) 

Figure 22.4. (a) Simultaneous contrast: the apparent brightness of the center bar is affected 
by the brightness of the surrounding area; (b) The same bar without a variable surround. 
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(a) (b) 

Figure 22.5. The Munker-White illusion shows the complexity of simultaneous contrast. In 
Figure 22.4, the central region looked lighter when the surrounding area was darker. In (a), 
the gray strips on the left look lighter than the gray strips on the right, even though they are 
nearly surrounded by regions of white; (b) shows the gray strips without the black lines. 



Figure 22.6. The percep¬ 
tion of lightness is affected 
by the perception of 3D 
structure. The two surfaces 
marked (a) have the same 
brightness, as do the two 
surfaces marked (b) (after 
Adelson (1999)). 


one region based on contrasting brightness in the surrounding region, it is actually 
much more complicated than that (Figures 22.5 and 22.6). For more on lightness 
perception, see (Gilchrist et al., 1999) and (Adelson, 1999). 

While the visual system largely ignores slowly varying intensity patterns, it 
is extremely sensitive to edges consisting of lines of discontinuity in brightness. 
Edges in imaged light intensity often correspond to surface boundaries or other 
important features in the environment (Figure 22.7). The vision system can also 
detect localized differences in motion, stereo disparity, texture, and several other 



(a) (b) 

Figure 22.7. (a) Original gray scale image, (b) image edges, which are lines of high spatial 
variability in some direction. 
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Figure 22.8. The visual system sometimes sees “edges” even when there are no sharp 
discontinuities in brightness, as is the case at the right side of the central pattern in this 
image. 


image properties. The vision system has very little ability, however, to detect 
spatial discontinuities in color when not accompanied by differences in one of 
these other properties. 

Perception of edges seems to interact with perception of form. While edges 
give the visual system the information it needs to recognize shapes, slowly varying 
brightness can appear as a sharp edge if the resulting edge creates a more complete 
form (Figure 22.8). Figure 22.9 shows a subjective contour, an extreme form of 
this effect in which a closed contour is seen even though no such contour exists 
in the actual image. Finally, the vision system’s sensitivity to edges also appears 
to be part of the mechanism involved in lightness perception. Note that the region 
enclosed by the subjective contour in Figure 22.9 appears a bit brighter than the 
surrounding area of the page. Figure 22.10 shows a different interaction between 
edges and lightness. In this case, a particular brightness profile at the edge has 
a dramatic effect on the apparent brightness of the surfaces to either side of the 
edge. 



Figure 22.9. Sometimes, the visual system will “see” subjective contours without any 
associated change in brightness. 
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Figure 22.10. Perceived lightness depends more on local contrast at edges than on bright¬ 
ness across surfaces. Try covering the vertical edge in the middle of the figure with a pencil. 
This figure is an instance of the Craik-O’Brien-Comsweet illusion. 


As indicated above, people can detect differences in the brightness between 
two adjacent regions if the difference is at least 1% of the average brightness. 
This is an example of Weber’s law, which states that there is a constant ratio 
between the just noticeable differences (jnd) in a stimulus and the magnitude of 
the stimulus: 



where I is the magnitude of the stimulus, A I is the magnitude of the just notice¬ 
able difference, and k\ is a constant particular to the stimulus. Weber’s law was 
postulated in 1846 and still remains a useful characterization of many perceptual 
effects. Fechner’s law , proposed in 1860, generalized Weber’s law in a way that 
allowed for the description of the strength of any sensory experience, not just 
jnd’s: 

S = fc 2 log(/), (22.3) 

where S is the perceptual strength of the sensory experience, I is the physical 
magnitude of the corresponding stimulus, and k 2 is a scaling constant specific to 
the stimulus. Current practice is to model the association between perceived and 
actual strength of a stimulus using a power function ( Stevens’s law): 

S = k 3 I b , (22.4) 

where S and / are as before, k 3 is another scaling constant, and b is an exponent 
specific to the stimulus. For a large number of perceptual quantities involving 
vision, b < 1. The CIE L*a*b* color space, described elsewhere, uses a mod¬ 
ified Stevens’s law representation to characterize perceptual differences between 
brightness values. Note that in the first two characterizations of the perceptual 
strength of a stimulus and in Steven’s Law when b < 1, changes in the stimulus 
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when it has a small average magnitude create larger perceptual effects than do the 
same physical change in the stimulus when it has a larger magnitude. 

The “laws” describe above are not physical constraints on how perception 
operates. Rather, they are generalizations about how the perceptual system re¬ 
sponds to particular physical stimuli. In the field of perceptual psychology, the 
quantitative study of the relationships between physical stimuli and their percep¬ 
tual effects is called psychophysics. While psychophysical laws are empirically 
derived observations rather than mechanistic accounts, the fact that so many per¬ 
ceptual effects are well modeled by simple power functions is striking and may 
provide insights into the mechanisms involved. 


22.2.2 Color 

In 1666, Isaac Newton used prisms to show that apparently white sunlight could 
be decomposed into a spectrum of colors and that these colors could be recom¬ 
bined to produce light that appeared white. We now know that light energy is 
made up of a collection of photons, each with a particular wavelength. The spec¬ 
tral distribution of light is a measure of the average energy of the light at each 
wavelength. For natural illumination, the spectral distribution of light reflected 
off of surfaces varies significantly depending on the surface material. Character¬ 
izations of this spectral distribution can therefore provide visual information for 
the nature of surfaces in the environment. 

Most people have a pervasive sense of color when they view the world. Color 
perception depends on the frequency distribution of light, with the visible spec¬ 
trum for humans ranging from a wavelength of about 370 nm to a wavelength of 
about 730 nm (see Plate X). The manner in which the visual systems derives a 
sense of color from this spectral distribution was first systematically examined in 
1801 and remained extremely controversial for 150 years. The problem is that the 
visual system responds to patterns of spectral distribution very differently than 
patterns of luminance distribution. 

Even accounting for phenomena such as lightness constancy, distinctly differ¬ 
ent spatial distributions almost always look distinctly different. More importantly 
given that the purpose of the visual system is to produce descriptions of the distal 
stimulus given the proximal stimulus, perceived patterns of lightness correspond 
at least approximately to patterns of brightness over surfaces in the environment. 
The same is not true of color perception. Many quite different spectral distri¬ 
butions of light can produce a sense of any specific color. Correspondingly, the 
sense that a surface is a specific color provides little direct information about the 
spectral distribution of light coming from the surface. For example, a spectral 


“The history of the investi¬ 
gation of colour vision is re¬ 
markable for its acrimony.” 

—Richard Gregory (1997) 
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distribution consisting of a combination of light at wavelengths of 700 nm and 
540 nm, with appropriately chosen relative strengths, will look indistinguishable 
from light at the single wavelength of 580 nm. (Perceptually indistinguishable 
colors with different spectral compositions are referred to as metamers .) If we see 
the color “yellow,” we have no way of knowing if it was generated by one or the 
other of these distributions or an infinite family of other spectral distributions. For 
this reason, in the context of vision the term color refers to a purely perceptual 
quality, not a physical property. 

There are two classes of photoreceptors in the human retina. Cones are in¬ 
volved in color perception, while rods are sensitive to light energy across the 
visible range and do not provide information about color. There are three types of 
cones, each with a different spectral sensitivity (Figure 22.11). S-cones respond 
to short wavelengths in the blue range of the visible spectrum. M-cones respond 
to wavelengths in the middle (greenish) region of the visible spectrum. L-cones 
respond to somewhat longer wavelengths covering the green and red portions of 
the visible spectrum. 

While it is common to describe the three types of cones as red, green, and 
blue, this is neither correct terminology nor does it accurately reflect the cone 
sensitivities shown in Figure 22.11. The L-cones and M-cones are broadly tuned, 
meaning that they respond to a wide range of frequencies. There is also substantial 
overlap between the sensitivity curves of the three cone types. Taken together, 
these two properties mean that it is not possible to reconstruct an approximation 
to the original spectral distribution given the responses of the three cone types. 
This is in contrast to spatial sampling in the retina (and in digital cameras), where 



400 500 600 700 

wavelength (nanometers) 


Figure 22.11. Spectral sensitivity of the short, medium, and long cones in the human retina. 
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the receptors are narrowly tuned in their spatial sensitivity in order to be able to 
detect fine detail in local contrast. 

The fact that there are are only three types of color sensitive photoreceptors 
in the human retina greatly simplifies the task of displaying colors on computer 
monitors and in other graphical displays. Computer monitors display colors as 
a weighted combination of three fixed color distributions. Most often, the three 
colors are a distinct red, a distinct green, and a distinct blue. As a result, in 
computer graphics, color is often represented by a red-green-blue (RGB) triple, 
representing the intensities of red, green, and blue primaries needed to display 
a particular color. Three basis colors are sufficient to display most perceptible 
colors, since appropriately weighted combinations of three appropriately chosen 
colors can produce metamers for these perceptible colors. 

There are at least two significant problems with the RGB color representation. 
The first is that different monitors have different spectral distributions for their 
red, green, and blue primaries. As a result, perceptually correct color rendition 
involves remapping RGB values for each monitor. This is of course only possible 
if the original RGB values satisfy some well defined standard, which is often not 
the case. See Chapter 21 for more information on this issue. The second problem 
is that RGB values do not define a particular color in a way that corresponds to 
subjective perception. When we see the color “yellow,” we do not have the sense 
that it is made up of equal parts of red and green light. Rather, it looks like a single 
color, with additional properties involving brightness and the “amount” of color. 
Representing color as the output of the S-cones, M-cones, and L-cones is no help 
either, since we have no more phenomenological sense of color as characterized 
by these properties than we do as characterized by RGB display properties. 

There are two different approaches to characterizing color in a way that more 
closely reflects human perception. The various CIE color spaces aim to to be 
“perceptually uniform” so that the magnitude of the difference in the represented 
values of two colors is proportional to the perceived difference in color (Wyszecki 
& Stiles, 2000). This turns out to be a difficult goal to accomplish, and there 
have been several modifications to the CIE model over the years. Furthermore, 
while one of the dimensions of the CIE color spaces corresponds to perceived 
brightness, the other two dimensions that specify chromaticity have no intuitive 
meaning. 

The second approach to characterizing color in a more natural manner starts 
with the observation that there are three distinct and independent properties that 
dominate the subjective sense of color. Lightness, the apparent brightness of a 
surface, has already been discussed. Saturation refers to the purity or vividness 
of a color. Colors can range from totally unsaturated gray to partially saturated 
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pastels to fully saturated “pure” colors. The third property, hue, corresponds most 
closely to the informal sense of the word “color” and is characterized in a manner 
similar to colors in the visible spectrum, ranging from dark violet to dark red. 
Plate XI shows a plot of the hue-saturation-lightness (HSV) color space. Since 
the relationship between brightness and lightness is both complex and not well 
understood, HSV color spaces almost always use brightness instead of attempting 
to estimate lightness. Unlike wavelengths in the spectrum, however, hue is usu¬ 
ally represented in a manner that reflects the fact that the extremes of the visible 
spectrum are actually similar in appearance (Plate XII). Simple transformations 
exist between RGB and HSV representations of a particular color value. As a 
result, while the HSV color space is motivated by perceptual considerations, it 
contains no more information than does an RGB representation. 

The hue-saturation-lightness approach to describing color is based on the 
spectral distribution at a single point and so only approximates the perceptual 
response to spectral distributions of light distributed over space. Color percep¬ 
tion is subject to similar constancy and simultaneous contrast effects as is light¬ 
ness/brightness, neither of which are captured in the RGB representation and as 
a result are not captured in the HSV representation. For an example of color 
constancy, look at a piece of white paper indoors under incandescent light and 
outdoors under direct sunlight. The paper will look “white” in both cases, even 
though incandescent light has a distinctly yellow hue and so the light reflected off 
of the paper will also have a yellow hue, while sunlight has a much more uniform 
color spectrum. 

Another aspect of color perception not captured by either the CIE color spaces 
or HSV encoding is the fact that we see a small number of distinct colors when 
looking at a continuous spectrum of visible light (Plate X) or in a naturally oc¬ 
curring rainbow. For most people, the visible spectrum appears to be divided into 
four to six distinct colors: red, yellow, green, and blue, plus perhaps light blue and 
purple. Considering non-spectral colors as well, there are only eleven basic color 
terms commonly used in English: red, green, blue, yellow, black, white, gray, 
orange, purple, brown, and pink. The partitioning of the intrinsically continuous 
space of spectral distributions into a relatively small set of perceptual categories 
associated with well defined linguistic terms seems to be a basic property of per¬ 
ception, not just a cultural artifact (Berlin & Kay, 1969). The exact nature of the 
process, however, is not well understood. 

22.2.3 Dynamic Range 

Natural illumination varies in intensity over 6 orders of magnitude (Figure 22.12). 
The human vision system is able to operate over this full range of brightness lev- 


22.2. Visual Sensitivity 


565 


els. However, at any one point in time the visual system is only able to detect vari¬ 
ations in light intensity over a much smaller range. As the average brightness to 
which the visual system is exposed changes over time, the range of discriminable 
brightnesses changes in a corresponding manner. This effect is most obvious if we 
move rapidly from a brightly lit outdoor area to a very dark room. At first, we are 
able to see little. After a while, however, details in the room start to become ap¬ 
parent. The dark adaptation that occurs involves a number physiological changes 
in the eye. It takes several minutes for significant dark adaptation to occur and 40 
minutes or so for complete dark adaptation. If we then move back into the bright 
light, not only is vision difficult but it can actually be painful. Light adaptation is 
required before it is again possible to see clearly. Light adaptation occurs much 
more quickly than dark adaptation, typically requiring less than a minute. 

The two classes of photoreceptors in the human retina are sensitive to dif¬ 
ferent ranges of brightness. The cones provide visual information over most of 
what we consider normal lighting conditions, ranging from bright sunlight to dim 
indoor lighting. The rods are only effective at very low light levels. Photopic 
vision involves bright light in which only the cones are effective. Scotopic vision 
involves dark light in which only the rods are effective. There is a range of inten¬ 
sities within which both cones and rods are sensitive to changes in light, which is 
referred to as mesopic conditions (see Chapter 23). 


direct sunlight 
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Figure 22.12. 
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22.2.4 Field-of-View and Acuity 

Each eye in the human visual system has a field-of-view of approximately 160° 
horizontal by 135° vertical. With binocular viewing, there is only partial overlap 
between the fields-of-view of the two eyes. This results in a wider overall field-of- 
view (approximately 200° horizontal by 135° vertical), with the region of overlap 
being approximately 120° horizontal by 135° vertical. 

With normal or corrected-to-normal vision, we usually have the subjective 
experience of being able to see relatively fine detail wherever we look. This is an 
illusion, however. Only a small portion of the visual field of each eye is actually 
sensitive to fine detail. To see this, hold a piece of paper covered with normal¬ 
sized text at arms length, as shown in Figure 22.13. Cover one eye with the hand 
not holding the paper. While staring at your thumb and not moving your eye, note 
that the text immediately above your thumb is readable while the text to either 
side is not. High acuity vision is limited to a visual angle slightly larger than 
your thumb held at arm’s length. We do not normally notice this because the 
eyes usually move frequently, allowing different regions of the visual field to be 
viewed at high resolution. The visual system then integrates this information over 
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Figure 22.13. If you hold a page of text at arm's length and stare at your thumb, only the 
text near your thumb will be readable. Photo by Peter Shirley. 

time to produce the subjective experience of the whole visual field being seen at 
high resolution. 

There is not enough bandwidth in the human visual cortex to process the infor¬ 
mation that would result if there was a dense sampling of image intensity over the 
whole of the retina. The combination of variable density photoreceptor packing 
in the retina and a mechanism for rapid eye movements to point at areas of in¬ 
terest provides a way to simultaneously optimize acuity and field-of-view. Other 
animals have evolved different ways of balancing acuity and field-of-view that 
are not dependent on rapid eye movements. Some have only high acuity vision, 
but limited to a narrow field-of-view. Others have wide field-of-view vision, but 
limited ability to see detail. 

The eye motions which focus areas of interest in the environment on the fovea 
are called saccades. Saccades occur very quickly. The time from a triggering 
stimulus to the completion of the eye movement is 150-200 ms. Most of this time 
is spent in the vision system planning the saccade. The actual motion takes 20 ms 
or so on average. The eyes are moving very quickly during a saccade, with the 
maximum rotational velocity often exceeding 500°/second. Between saccades, 
the eyes point towards an area of interest (fixate ), taking 300 ms or so to acquire 
fine detail visual information. The mechanism by which multiple fixations are 
integrated to form an overall subjective sense of fine detail over a wide field of 
view is not well understood. 

Figure 22.14 shows the variable packing density of cones and rods in the hu¬ 
man retina. The cones, which are responsible for vision under normal lighting, 
are packed most closely at the fovea of the retina (Figure 22.14). When the eye 
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Figure 22.14. Density of rods and cone in the human retina (after Osterberg (1935)). 


is fixated at a particular point in the environment, the image of that point falls on 
the fovea. The higher packing density of cones at the fovea results in a higher 
sampling frequency of the imaged light (see Chapter 9) and hence greater detail 
in the sampled pattern. Foveal vision encompasses about 1.7°, which is the same 
visual angle as the width of your thumb held at arm’s length. 

While a version of Figure 22.14 appears in most introductory texts on human 
visual perception, it provides only a partial explanation for the neurophysiological 
limitations on visual acuity. The output of individual rods and cones are pooled in 
various ways by neural interconnects in the eye, before the information is shipped 
along the optic nerve to the visual cortex. 3 This pooling filters the signal provided 
by the pattern of incident illumination in ways that have important impacts on the 
patterns of light that are detectable. In particular, the farther away from the fovea, 
the larger the area over which brightness is averaged. As a consequence, spatial 
acuity drops sharply away from the fovea. Most figures showing rod and cone 
packing density indicate the location of the retinal blind spot, where the nerve 
bundle carrying optical information from the eye to the brain passes through the 
retina, and there is no sensitivity to light. By and large, the only practical impact 
of the blind spot on real-world perception is its use as an illusion in introduc¬ 
tory perception texts, since normal eye movements otherwise compensate for the 
temporary loss of information. 


’All of the cells in the optic nerve and almost all cells in visual cortex have an associated retinal 
receptive field. Patterns of light hitting the retina outside of a cell’s receptive field have no effect on 
the firing rate of that cell. 
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As shown in Figure 22.14, the packing density of rods drops to zero at the 
center of the fovea. Away from the fovea, the rod density first increases and then 
decreases. One result of this is that there is no foveal vision when illumination 
is very low. The lack of rods in the fovea can be demonstrated by observing a 
night sky on a moonless night, well away from any city lights. Some stars will 
be so dim that they will be visible if you look at at point in the sky slightly to the 
side of the star, but they will disappear if you look directly at them. This occurs 
because when you look directly at these features, the image of the features falls 
only on the cones in the retina, which are not sufficiently light sensitive to detect 
the feature. Looking slightly to the side causes the image to fall on the more 
light sensitive cones. Scotopic vision is also limited in acuity, in part because 
of the lower density of rods over much of the retina and in part because greater 
pooling of signals from the rods occurs in the retina in order to increase the light 
sensitivity of the visual information passed back to the brain. 

22.2.5 Motion 

When reading about visual perception and looking at static figures on a printed 
page, it is easy to forget that motion is pervasive in our visual experience. The 
patterns of light that fall on the retina are constantly changing due to eye and body 
motion and the movement of objects in the world. This section covers our ability 
to detect visual motion. Section 22.3.4 describes how visual motion can be used 
to determine geometric information about the environment. Section 22.4.3 deals 
with the use of motion to guide our movement through the environment. 

The detectability of motion in a particular pattern of light falling on the retina 
is a complex function of speed, direction, pattern size, and contrast. The issue is 
further complicated because simultaneous contrast effects occur for motion per¬ 
ception in a manner similar to that observed in brightness perception. In the 
extreme case of a single small pattern moving against a contrasting, homoge¬ 
nous background, perceivable motion requires a rate of motion corresponding to 
0.2°-0.3°/second of visual angle. Motion of the same pattern moving against a 
textured pattern is detectable at about a tenth this speed. 

With this sensitivity to retinal motion, combined with the frequency and ve¬ 
locity of saccadic eye movements, it is surprising that the world usually appears 
stable and stationary when we view it. The vision system accomplishes this in 
three ways. Contrast sensitivity is reduced during saccades, reducing the visual 
effects generated by these rapid changes in eye position. Between saccades, a 
variety of sophisticated and complex mechanisms adjust eye position to compen¬ 
sate for head and body motion and the motion of objects of interest in the world. 
Finally, the visual system exploits information about the position of the eyes to 
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Figure 22.15. The aperture problem: (a) If a straight line or edge moves in such a way 
that its end points are hidden, the visual information is not sufficient to determine the actual 
motion of the line, (b) 2D motion of a line is unambiguous if there are any corners or other 
distinctive markings on the line. 


assemble a mosaic of small patches of high resolution imagery from multiple fix¬ 
ations into a single, stable whole. 

The motion of straight lines and edges is ambiguous if no endpoints or cor¬ 
ners are visible, a phenomenon referred to as the aperture problem (Figure 22.15). 
The aperture problem arises because the component of motion parallel to the line 
or edge does not produce any visual changes. The geometry of the real world 
is sufficiently complex that this rarely causes difficulties in practice, except for 
intentional illusions such as barber poles. The simplified geometry and textur¬ 
ing found in some computer graphics renderings, however, has the potential to 
introduce inaccuracies in perceived motion. 

Real-time computer graphics, film, and video would not be possible without 
an important perceptual phenomena: discontinuous motion, in which a series of 
static images are visible for discrete intervals in time and then move by discrete 
intervals in space, can be nearly indistinguishable from continuous motion. The 
effect is called apparent motion to highlight that the appearance of continuous 
motion is an illusion. 

Figure 22.16 illustrates the difference between continuous motion, which is 
typical of the real world, and apparent motion, which is generated by almost all 
dynamic image display devices. The motion plotted in Figure 22.16 (b) consists 
of an average motion comparable to that shown in Figure 22.16 (a), modulated by 
a high space-time frequency that accounts for the alternation between a stationary 
pattern and one that moves discontinuously to a new location. Apparent percep- 
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(a) (b) 

Figure 22.16. (a) Continuous motion, (b) Discontinuous motion with the same average 

velocity. Under some circumstances, the perception of these two motion patterns may be 
similar. 


tion of continuous motion occurs because the visual system is insensitive to the 
high frequency component of the motion. 

A compelling sense of apparent motion occurs when the rate at which indi¬ 
vidual images appear is above about 10 Hz, as long as the positional changes 
between successive images is not too great. This rate is not fast enough, how¬ 
ever, to produce a satisfying sense of continuous motion for most image display 
devices. Almost all such devices introduce brightness variation as one image is 
switched to the next. In well lit conditions, the human visual system is sensitive 
to this varying brightness for rates of variations up to about 80 Hz. In lower light, 
detectability is present up to about 40 Hz. When the rate of alternating brightness 
is sufficiently high, flicker fusion occurs and the variation is no longer visible. 

To produce a compelling sense of visual motion, an image display must there¬ 
fore satisfy two separate constraints: 

• images must be updated at a rate > 10 Hz; 

• any flicker introduced in the process of updating images must occur at a 
rate > 60-80 Hz. 

One solution is to require that the image update rate be greater than or equal to 
60-80 Hz. In many situations, however, this is simply not possible. For computer 
graphics displays, the frame computation time is often substantially greater than 
12-15 msec. Transmission bandwidth and limitations of older monitor technolo¬ 
gies limit normal broadcast television to 25-30 images per second. (Some HDTV 
formats operate at 60 images/sec.) Movies update images at 24 frames/second 
due to exposure time requirements and the mechanical difficulties of physically 
moving film any faster than that. 
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Different display technologies solve this problem in different ways. Computer 
displays refresh the displayed image at ~70-80 Hz, regardless of how often the 
contents of the image change. The term frame rate is ambiguous for such displays, 
since two values are required to characterize this display: refresh rate, which 
indicates the rate at which the image is redisplayed and frame update rate, which 
indicates the rate at which new images are generated for display. Standard non- 
HDTV broadcast television uses a refresh rate of 60 Hz (NTSC, used in North 
America and some other locations) or 50 Hz (PAL, used in most of the rest of 
the world). The frame update rate is half the refresh rate. Instead of displaying 
each new image twice, the display is interlaced by dividing alternating horizontal 
image lines into even and odd fields and alternating the display of these even and 
odd fields. Flicker is avoided in movies by using a mechanical shutter to blink 
each frame of the film three times before moving to the next frame, producing a 
refresh rate of 72 Hz while maintaining the frame update rate of 24 Hz. 

The use of apparent motion to simulate continuous motion occasionally pro¬ 
duces undesirable artifacts. Best known of these is the wagon wheel illusion in 
which the spokes of a rotating wheel appear to revolve in the opposite direction 
from what would be expected given the translational motion of the wheel. The 
wagon wheel illusion is an example of temporal aliasing. Spokes, or other spa¬ 
tially periodic patterns on a rotating disk, produce a temporally periodic signal 
for viewing locations that are fixed with respect to the center of the wheel or disk. 
Fixed frame update rates have the effect of sampling this temporally periodic sig¬ 
nal in time. If the temporal frequency of the sampled pattern is too high, under 
sampling results in an aliased, lower temporal frequency appearing when the im¬ 
age is displayed. Under some circumstances, this distortion of temporal frequency 
causes a spatial distortion in which the wheel appears to move backwards. Wagon 
wheel illusions are more likely to occur with movies than with video, since the 
temporal sampling rate is lower. 

Problems can also occur when apparent motion imagery is converted from 
one medium to another. This is of particular concern when 24 Hz movies are 
transferred to video. Not only does a non-interlaced format need to be translated 
to an interlaced format, but there is no straightforward way to move from 24 
frames per second to 50 or 60 fields per second. Some high-end display devices 
have the ability to partially compensate for the artifacts introduced when film is 
converted to video. 

22.3 Spatial Vision 

One of the critical operations performed by the visual system is the estimation of 
geometric properties of the visible environment, since these are central to deter- 
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mining information about objects, locations, and events. Vision has sometimes 
been described as inverse optics , to emphasize that one function of the visual sys¬ 
tem is to invert the image formation process in order to determine the geometry, 
materials, and lighting in the world that produced a particular pattern on light 
on the retina. The central problem for a vision system is that properties of the 
visible environment are confounded in the patterns of light imaged on the retina. 
Brightness is a function of both illumination and reflectance, and can depend on 
environmental properties across large regions of space due to the complexities of 
light transport. Image locations of a projected environmental location at best can 
be used to constrain the position of that location to a half-line. As a consequence, 
it is rarely possible to uniquely determine the nature of the world that produced a 
particular imaged pattern of light. 

Determining surface layout— the location and orientation of visible surfaces 
in the environment—is thought to be a key step in human vision. Most discus¬ 
sions of how the vision system extracts information about surface layout from the 
patterns of light it receives divide the problem into a set of visual cues, with each 
cue describing a particular visual pattern which can be used to infer properties 
of surface layout along with the needed rules of inference. Since surface layout 
can rarely be determined accurately and unambiguously from vision alone, the 
process of inferring surface layout usually requires additional, non-visual infor¬ 
mation. This can come from other senses or assumptions about what is likely to 
occur in the real world. 

Visual cues are typically categorized into four categories. Ocularmotor cues 
involve information about the position and focus of the eyes. Disparity cues in¬ 
volve information extracted from viewing the same surface point with two eyes, 
beyond that available just from the positioning of the eyes. Motion cues provide 
information about the world that arises from either the movement of the observer 
or the movement of objects. Pictorial cues result from the process of projecting 
3D surface shapes onto a 2D pattern of light that falls on the retina. This sec¬ 
tion deals with the visual cues relevant to the extraction of geometric information 
about individual points on surfaces. More general extraction of location and shape 
information is covered in Section 22.4. 


22.3.1 Frames of Reference and Measurement Scales 

Descriptions of the location and orientation of points on a visible surface must be 
done within the context of a particular frame of references that specifies the ori¬ 
gin, orientation, and scaling of the coordinate system used in representing the ge¬ 
ometric information. The human vision system uses multiple frames of reference, 


22.3. Spatial Vision 


573 


partially because of the different sorts of information available from different vi¬ 
sual cues and partly because of the different purposes to which the information 
is put (Klatzky, 1998). Egocentric representations are defined with respect to the 
viewer’s body. They can be subdivided into coordinate systems fixed to the eyes, 
head, or body. Allocentric representations, also called exocentric representations, 
are defined with respect to something external to the viewer. Allocentric frames 
of reference can be local to some configuration of objects in the environment or 
can be globally defined in terms of distinctive locations, gravity, or geographic 
properties. 

The distance from the viewer to a particular visible location in the environ¬ 
ment, expressed in an egocentric representation, is often referred to as depth in 
the perception literature. Surface orientation can be represented in either egocen¬ 
tric or allocentric coordinates. In egocentric representations of orientation, the 
term slant is used to refer to the angle between the line of sight to the point and 
the surface normal at the point, while the term tilt refers to the orientation of the 
projection of the surface normal onto a plane perpendicular to the line of sight. 

Distance and orientation can be expressed in a variety of measurement scales. 
Absolute descriptions are specified using a standard that is not part of the sensed 
information itself. These can be culturally defined standards (e.g, meters), or 
standards relative to the viewer’s body (e.g., eye height, the width of one’s shoul¬ 
ders). Relative descriptions relate one perceived geometric property to another 
(e.g., point a is twice as far away as point b). Ordinal descriptions are a special 
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Figure 22.17. Common visual cues for absolute (a), relative (r), and ordinal (o) depth. 
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case of relative measure in which the sign, but not the magnitude, of the relation 
is all that is represented. Figure 22.17 provides a list of the most commonly con¬ 
sidered visual cues, along with a characterization of the sorts of information they 
can potentially provide. 


22.3.2 Ocularmotor Cues 

Ocularmotor information about depth results directly from the muscular control 
of the eyes. There are two distinct types of ocularmotor information. Accommo¬ 
dation is the process by which the eye optically focuses at a particular distance. 
Convergence (often referred to as vergence) is the process by which the two eyes 
are pointed towards the same point in three-dimensional space. Both accommo¬ 
dation and convergence have the potential to provide absolute information about 
depth. 

Physiologically, focusing in the human eye is accomplished by distorting the 
shape of the lens at the front of the eye. The vision system can infer depth from 
the amount of this distortion. Accommodation is a relatively weak cue to distance 
and is ineffective beyond about 2 m. Most people have increasing difficultly in 
focusing over a range of distances as they get beyond about 45 years old. For 
them, accommodation becomes even less effective. 

Those not familiar with the specifics of visual perception sometimes confuse 
depth estimation from accommodation with depth information arising out of the 




Figure 22.18. Does the central square appear in front of the pattern of circles or is it seen 
as appearing through a square hole in the pattern of circles? The only difference in the two 
images is the sharpness of the edge between the line and circle patterns (Marshall et al. 
(1999), used by permission). 
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Figure 22.19. The vergence of the two eyes provides information about the distance to the 
point on which the eyes are fixated. 


blur associated with limited depth-of-field in the eye. The accommodation depth 
cue provides information about the distance to that portion of the visual field that 
it is in focus. It does not depend on the degree to which other portions of the visual 
field are out of focus, other than that blur is used by the visual system to adjust 
focus. Depth-of-field does seem to provide a degree of ordinal depth information 
(Figure 22.18), though this effect has received only limited investigation. 

If two eyes fixate on the same point in space, trigonometry can be used to 
determine the distance from the viewer to the viewed location (Figure 22.19). For 
the simplest case, in which the point of interest is directly in front of the viewer. 


ipd/ 2 
tan 0 ' 


(22.5) 


where z is the distance to a point in the world, ipd is the interpupillary distance 
indicating the distance between the eyes, and 9 is the vergence angle indicating 
the orientation of the eyes relative to straight ahead. For small 9, which is the case 
for the geometric configuration of human eyes, tan 9 ss 9 when 9 is expressed in 
radians. Thus, differences in vergence angle specify differences in depth by the 
following relationship: 


A 0 


ipd 1 
~2 ' ~Kz‘ 


( 22 . 6 ) 


As 9 —> 0 in uniform steps, A z gets increasingly larger. This means that stereo 
vision is less sensitive to changes in depth as the overall depth increases. Conver¬ 
gence in fact only provides information on absolute depth for distances out to a 
few meters. Beyond that, changes in distance produce changes in vergence angle 
that are too small to be useful. 

There is an interaction between accommodation and convergence in the hu¬ 
man visual system: accommodation is used to help determine the appropriate 
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vergence angle, while vergence angle is used to help set the focus distance. Nor¬ 
mally, this helps the visual system when there is uncertainty is setting either ac¬ 
commodation or vergence. However, stereographic computer displays break the 
relationship between focus and convergence that occurs in the real world, leading 
to a number of perceptual difficulties (Wann et al., 1995). 


22.3.3 Binocular Disparity 

The vergence angle of the eyes when fixated on a common point in space is only 
one of the ways that the visual system is able to determine depth from binocular 
stereo. A second mechanism involves a comparison of the retinal images in the 
two eyes and does not require information about where the eyes are pointed. A 
simple example demonstrates the effect. Hold your arm straight out in front of 
you, with your thumb pointed up. Stare at your thumb and then close one eye. 
Now, simultaneously open the closed eye and close the open eye. Your thumb will 
appear to be more or less stationary, while the more distant surfaces seen behind 
your thumb will appear to move from side to side (Figure 22.20). The change 
in retinal position of points in the scene between the left and right eyes is called 
disparity. 

The binocular disparity cue requires that the vision system be able to match 
the image of points in the world in one eye with the imaged locations of those 
points in the other eye, a process referred to as the correspondence problem. This 
is a relatively complicated process and is only partially understood. Once cor¬ 
respondences have been established, the relative positions on which particular 



(left eye image) (right eye image) 


Figure 22.20. Binocular disparity. The view from the left and right eyes shows an offset for 
surface points at depths different from the point of fixation. Images courtesy Peter Shirley. 
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Figure 22.21. Near the line of sight, surface points nearer than the fixation point produce 
disparities in the opposite direction from those associated with surface points more distant 
than the fixation point. 


points in the world project onto the left and right retinas indicate whether the 
points are closer than or farther away than the point of fixation. Crossed disparity 
occurs when the corresponding points are displaced outward relative to the fovea 
and indicates that the surface point is closer than the point of fixation. Uncrossed 
disparity occurs when the corresponding points are displaced inward relative to 
the fovea and indicates that the surface point is farther away than the point of 
fixation (Figure 22.21). 4 Binocular disparity is a relative depth cue, but it can 
provide information about absolute depth when scaled by convergence. Equation 
(22.5) applies to binocular disparity as well as binocular convergence. As with 
convergence, the sensitivity of binocular disparity to changes in depth decreases 
with depth. 

22.3.4 Motion Cues 

Relative motion between the eyes and visible surfaces will produce changes in the 
image of those surfaces on the retina. Three-dimensional relative motion between 
the eye and a surface point produces two-dimensional motion of the projection of 
the surface point on the retina. This retinal motion is given the name optic flow. 
Optic flow serves as the basis for several types of depth cues. In addition, optic 
flow can be used to determine information about how a person is moving in the 
world and whether or not a collision is imminent (Section 22.4.3). 

If a person moves to the side while continuing to fixate on some surface point, 
then optic flow provides information about depth similar to stereo disparity. This 

technically, crossed and uncrossed disparities indicate that the surface point generating the dis- 
parity is closer to or farther away from the horopter. The horopter is not a fixed distance away from 
the eyes but rather it is a curved surface passing through the point of fixation. 
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Figure 22.22. (a) Motion parallax generated by sideways movement to the right while 

looking at an extended ground plane, (b) The same motion, with eye tracking of the fixation 
point. 



Figure 22.23. Discon¬ 
tinuities in optic flow sig¬ 
nal surface boundaries. In 
many cases, the sign of the 
depth change (i.e., the or¬ 
dinal depth) can be deter¬ 
mined. 


is referred to as motion parallax. For other surface points that project to reti¬ 
nal locations near the fixation point, zero optic flow indicates a depth equivalent 
to the fixation point; flow in the opposite direction to head translation indicates 
nearer points, equivalent to crossed disparity; and flow in the same direction as 
head translation indicates farther points, equivalent to uncrossed disparity (Fig¬ 
ure 22.22). Motion parallax is a powerful cue to relative depth. In principle, 
motion parallax can provide absolute depth information if the visual system has 
access to information about the velocity of head motion. In practice, motion par¬ 
allax appears at best to be a weak cue for absolute depth. 

In addition to egocentric depth information due to motion parallax, visual 
motion can also provide information about the three-dimensional shape of ob¬ 
jects moving relative to the viewer. In the perception literature, this is known as 
the kinetic depth effect. In computer vision, it is referred to as structure-from- 
motion. The kinetic depth effect presumes that one component of object motion 
is rotation in depth, meaning that there is a component of rotation around an axis 
perpendicular to the line of sight. 

Optic flow can also provide information about the shape and location of sur¬ 
face boundaries, as shown in Figure 22.23. Spatial discontinuities in optic flow 
almost always either correspond to depth discontinuities or result from indepen¬ 
dently moving objects. Simple comparisons of the magnitude of optic flow are 
insufficient to determine the sign of depth changes, except in the special case of 
a viewer moving through an otherwise static world. Even when independently 
moving objects are present, however, the sign of the change in depth across sur¬ 
face boundaries can often be determined by other means. Motion often changes 
the portion of the more distant surface visible at surface boundaries. The appear¬ 
ance ( accretion ) or disappearance ( deletion ) of surface texture occurs because the 
nearer, occluding surface progressively uncovers or covers portions of the more 










22.3. Spatial Vision 


579 


distant, occluded surface. Comparisons of the motion of surface texture to either 
side of a boundary can also be used to infer ordinal depth, even in the absence 
of accretion or deletion of the texture. Discontinuities in optic flow and accre¬ 
tion/deletion of surface texture are referred to as dynamic occlusion cues and are 
another powerful source of visual information about the spatial structure of the 
environment. 

The speed that a viewer is traveling relative to points in the world cannot be 
determined from visual motion alone (see Section 22.4.3). Despite this limitation, 
it is possible to use visual information to determine the time it will take to reach a 
visible point in the world even when speed cannot be determined. When velocity 
is constant, time-to-contact (often referred to as time-to-collision ) is given by the 
retinal size of an entity towards which the observer is moving, divided by the rate 
at which that image size is increasing. 5 In the biological vision literature, this is 
often called the t function (Lee & Reddish, 1981). If distance information to the 
structure in the world on which the time-to-collision estimate is based is available, 
then this can be used to determine speed. 


22.3.5 Pictorial Cues 


An image can contain much information about the spatial structure of the world 
from which it arose, even in the absence of binocular stereo or motion. As evi¬ 
dence for this, note that the world still appears three-dimensional even if we close 
one eye, hold our head stationary, and nothing moves in the environment. (As 
discussed in Section 22.5, the situation is more complicated in the case of pho¬ 
tographs and other displayed images.) There are three classes of such pictorial 
depth cues. The best known of these involve linear perspective. There are also 
a number of occlusion cues that provide information about ordinal depth even in 
the absence of perspective. Finally, illumination cues involving shading, shadows 
and interreflections, and aerial perspective also provide visual information about 
spatial layout. 

The term linear perspective is often used to refer to properties of images in¬ 
volving object size in the image scaled by distance, the convergence of parallel 
lines, the ground plane extending to a visible horizon, and the relationship be¬ 
tween the distance to objects on the ground plane and the image location of those 
objects relative to the horizon (Figure 22.24). More formally, linear perspective 
cues are those visual cues which exploit the fact that under perspective projection, 
the image location onto which points in the world are projected is scaled by j, 

5 The terms time-to-collision and time-to-contact are misleading, since contact will only occur if 
the viewer’s trajectory actually passes through or near the entity under view. 



Figure 22.24. The 

classical linear perspective 
effects include object size 
scaled by distance, the con¬ 
vergence of parallel lines, 
the ground plane extending 
to a visible horizon, and po¬ 
sition on the ground plane 
relative to the horizon. Im¬ 
age courtesy Sam Pullara. 
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Figure 22.25. Absolute distance to locations on the ground plane can be determined based 
on declination angle from the horizon and eye height. 


where z is the distance from the point of projection to the point in the environ¬ 
ment. Direct consequences of this relationship are that points that are farther away 
are projected to points closer to the center of the image (convergence of parallel 
lines) and that the spacing between the image of points in the world decreases for 
more distant world points (object size in the image is scaled by distance). 6 The 
fact that the image of an infinite flat surface in the world ends at a finite horizon 
is explained by examining the perspective projection equation as z —> oo. 

With the exception of size-related effects described in Section 22.4.2, most 
pictorial depth cues involving linear perspective depend on objects of interest be¬ 
ing in contact with a ground plane. In effect, these cues estimate not the distance 
to the objects but, instead, the distance to the contact point on the ground plane. 
Assuming observer and object are both on top of a horizontal ground plane, then 
locations on the ground plane lower in the view will be close. Figure 22.25 illus¬ 
trates this effect quantitatively. For a viewpoint h above the ground and an angle 
of declination 9 between the horizon and a point of interest on the ground, the 
point in question is a distance d = h cot 6 from the point at which the observer 
is standing. The angle of declination provides relative depth information for arbi¬ 
trary fixed viewpoints and can provide absolute depth when scaling by eye height 
(h) is possible. 

While the human visual system almost certainly makes use of angle of decli¬ 
nation as a depth cue, the exact mechanisms used to acquire the needed informa¬ 
tion are not clear. The angle 9 could be obtained relative to either gravity or the 
visible horizon. There is some evidence that both are used in human vision. Eye 
height h could be based on posture, visually determined by looking at the ground 
at one’s feet, or learned by experience and presumed to be constant. While a 

6 The actual mathematics for analyzing the specifics of biological vision are different, since eyes 
are not well approximated by the planar projection formulation used in computer graphics and most 
other imaging applications. 
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Figure 22.26. Shadows can indirectly function as a depth cue by associating the depth of 
an object with a location on the ground plane (after Kersten et al. (1997)). 


number of researchers have investigated this issue, if and how these values are 
determined is not yet known with certainty. 

Shadows provide a variety of types of information about three-dimensional 
spatial layout. Attached shadows indicate that an object is in contact with another 
surface, often consisting of the ground plane. Detached shadows indicate that an 
object is close to some surface, but not in contact with that surface. Shadows can 
serve as an indirect depth cue by causing an object to appear at the depth of the 
location of the shadow on the ground plane (Yonas et al., 1978). When utilizing 
this cue, the visual system seems to make the assumption that light is coming 
from directly above (Figure 22.26). 

Vision provides information about surface orientation as well as distance. It 
is convenient to represent visually determined surface orientation in terms of tilt, 
defined as the orientation in the image of the projection of the surface normal, and 
slant, defined as the angle between the surface normal and the line of sight. 

A visible surface horizon can be used to find the orientation of an (effectively 
infinite) surface relative to the viewer. Determining tilt is straightforward, since 
the tilt of the surface is the orientation of the visible horizon. Slant can be re¬ 
covered as well, since the lines of sight from the eye point to the horizon define 
a plane parallel to the surface. In many situations, either the surface horizon is 
not visible or the surface is small enough that its far edge does not correspond 
to an actual horizon. In such cases, visible texture can still be used to estimate 
orientation. 

In the context of perception, the term texture refers to visual patterns consist¬ 
ing of sub-patterns replicated over a surface. The sub-patterns and their distri¬ 
bution can be fixed and regular, as for a checkerboard, or consistent in a more 
statistical sense, as in the view of a grassy field. 7 When a textured surface is 
viewed from an oblique angle, the projected view of the texture is distorted rela¬ 
tive to the actual markings on the surface. Two quite distinct types of distortions 
occur (Knill, 1998), both affected by the amount of slant. The position and size 

7 In computer graphics, the term texture has a different meaning, referring to any image that is 
applied to a surface as part of the rendering process. 
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Figure 22.27. Texture cues for slant, (a) Near surface exhibiting compression and texture 
gradient; (b) distant surface exhibiting only compression; (c) variability in appearance of near 
surface with regular geometric variability. 


of texture elements are subject to the linear perspective effects described above. 
This produces a texture gradient (Gibson, 1950) due to both element size and 
spacing decreasing with distance (Figure 22.27(a)). Both the image of individual 
texture elements and the distribution of elements ar t foreshortened under oblique 
viewing (Figure 22.27(b)). This produces a compression in the direction of tilt. 
For example, an obliquely viewed circle appears as an ellipse, with the ratio of the 
minor to major axes equal to the cosine of the slant. Note that foreshortening it¬ 
self is not a result of linear perspective, though in practice both linear perspective 
and foreshortening provide information about slant. 8 

For texture gradients to serve as a cue to surface slant, the average size and 
spacing of texture elements must be constant over the textured surface. If spa¬ 
tial variability in size and spacing in the image is not due in its entirely to the 
projection process, then attempts to invert the effects of projection will produce 
incorrect inferences about surface orientation. Likewise, the foreshortening cue 
fails if the shape of texture elements is not isotropic, since then asymmetric tex¬ 
ture element image shapes would occur in situations not associated with oblique 
viewing. These are examples of the assumptions often required in order for spa¬ 
tial visual cues to be effective. Such assumptions are reasonable to the degree that 
they reflect commonly occurring properties of the world. 

Shading also provides information about surface shape (Figure 22.28). The 
brightness of viewed points on a surface depends on the surface reflectance and 
the orientation of the surface with respect to directional light sources and the 
observation point. When the relative position of an object, viewing direction, 
and illumination direction remain fixed, changes in brightness over a constant 
reflectance surface are indications of changes in the orientation of the surface of 

8 A third form of visual distortion occurs when surfaces with distinct 3D surface relief are viewed 
obliquely (Leung & Malik, 1997), as shown in Figure 22.27(c). Nothing is currently know about if or 
how this effect might be used by the human vision system to determine slant. 
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(a) (b) 

Figure 22.28. Shape-from-shading. The images in (a) and (b) appear to have different 
3D shapes because of differences in the rate of change of brightness over their surfaces. 



Figure 22.29. Shading can generate a strong perception of three-dimensional shape. In this 
figure, the effect is stronger if you view the image from several meters away using one eye. 
It becomes yet stronger if you place a piece of cardboard in front of the figure with a hole cut 
out slightly smaller than the picture (see Section 22.5). Image courtesy Albert Yonas. (See 
also Plate XIII.) 
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Figure 22.30. (a) Junctions provide information about occlusion and the convexity or con¬ 
cavity of corners, (b) Common junction types for planar surface objects. 


the object. Shape-from-shading is the process of recovering surface shape from 
these variations in observed brightness. It is almost never possible to recover the 
actual orientation of surfaces from shading alone, though shading can often be 
combined with other cues to provide an effective indication of surface shape. For 
surfaces with fine-scale geometric variability, shading can provide a compelling 
three-dimensional appearance, even for an image rendered on a two-dimensional 
surface (Figure 22.29). 

There are a number of pictorial cues that yield ordinal information about 
depth, without directly indicating actual distance. In line drawings, different types 
of junctions provide constraints on the 3D geometry that could have generated the 
drawing (Figure 22.30). Many of these effects occur in more natural images as 
well. Most perceptually effective of the junction cues are T-junctions, which are 
strong indicators that the surface opposite the stem of the T is occluding at least 
one more distant surface. T-junctions often generate a sense of amodal comple¬ 
tion , in which one surface is seen to continue behind a nearer, occluding surface 
(Figure 22.31). 

Atmospheric effects cause visual changes that can provide information about 
depth, particularly outdoors over long distances. Leonardo da Vinci was the first 



Figure 22.31 . T-junctions cause the left disk to appear to be continuing behind the rectangle, 
while the right disk appears in front of the rectangle which is seen to continue behind the disk. 
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to describe aerial perspective (also called atmospheric perspective), in which 
scattering reduces the contrast of distant portions of the scene and causes them 
to appear more bluish than if they were nearer (da Vinci, 1970) (see Plate XX). 
Aerial perspective is predominately a relative depth cue, though there is some 
speculation that it may affect perception of absolute distance as well. While many 
people believe that more distant objects look blunder due to atmospheric effects, 
atmospheric scattering actually causes little blur. 


22.4 Objects, Locations, and Events 

While there is fairly wide agreement among current vision scientists that the pur¬ 
pose of vision is to extract information about objects, locations, and events, there 
is little consensus on the key features of what information is extracted, how it is 
extracted, or how the information is used to perform tasks. Significant contro¬ 
versies exist about the nature of object recognition and the potential interactions 
between object recognition and other aspects of perception. Most of what we 
know about location involves low-level spatial vision, not issues associated with 
spatial relationships between complex objects or the visual processes required to 
navigate in complex environments. We know a fair amount about how people 
perceive their speed and heading as they move through the world, but have only 
a limited understanding of actual event perception. Visual attention involves as¬ 
pects of the perception of objects, locations, and events. While there is much data 
about the phenomenology of visual attention for relatively simple and well con¬ 
trolled stimuli, we know much less about how visual attention serves high-level 
perceptual goals. 


22.4.1 Object Recognition 

Object recognition involves segregating an image into constituent parts corre¬ 
sponding to distinct physical entities and determining the identity of those entities. 
Figure 22.32 illustrates a few of the complexities associated with this process. We 
have little difficulty recognizing that the image on the left is some sort of vehi¬ 
cle, even though we have never before seen this particular view of a vehicle nor 
do most of us typically associate vehicles with this context. The image on the 
right is less easily recognizable until the page is turned upside down, indicating 
an orientational preference in human object recognition. 

Object recognition is thought to involve two, fairly distinct steps. The first 
step organizes the visual held into groupings likely to correspond to objects and 
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Figure 22.32. The complexities of object recognition, (a) We recognize a vehicle-like object 
even though we have likely never seen this particular view of a vehicle before, (b) The image 
is hard to recognize based on a quick view. It becomes much easier to recognize if the book 
is turned upside down. 


surfaces. These grouping processes are very powerful (see Figure 22.33), though 
there is little or no conscious awareness of the low-level image features that gener¬ 
ate the grouping effect. 9 Grouping is based on the complex interaction of proxim¬ 
ity, similarities in the brightness, color, shape, and orientation of primitive struc¬ 
tures in the image, common motion, and a variety of more complex relationships. 

The second step in object recognition is to interpret groupings as identified 
objects. A computational analysis suggests that there are a number of distinctly 



(a) (b) 


Figure 22.33. Images are perceptually organized into groupings based on a complex set 
of similarity and organizational criteria, (a) Similarity in brightness results in four horizontal 
groupings, (b) Proximity resulting in three vertical groupings. 


y The most common form of visual camouflage involves adding visual textures that fool the per¬ 
ceptual grouping processes so that the view of the world cannot be organized in a way that separates 
out the object being camouflaged. 
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Figure 22.34. Template matching. The bright spot in the right image indicates the best 
match location to the template in the left image. Image courtesy National Archives and 
Records Administration. 


different ways in which an object can be identified. The perceptual data is unclear 
as to which of these are actually used in human vision. Object recognition requires 
that the vision system have available to it descriptions of each class of object 
sufficient to discriminate each class from all others. Theories of object recognition 
differ in the nature of the information describing each class and the mechanisms 
used to match these descriptions to actual views of the world. 

Three general types of descriptions are possible. Templates represent object 
classes in terms of prototypical views of objects in each class. Figure 22.34 shows 
a simple example. Structural descriptions represent object classes in terms of dis¬ 
tinctive features of each class likely to be easily detected in views of the object, 
along with information about the geometric relationships between the features. 
Structural descriptions can either be represented in 2D or 3D. For 2D models 
of objects types, there must be a separate description for each distinctly differ¬ 
ent potential view of the object. For 3D models, two distinct forms of matching 
strategies are possible. In one, the three-dimensional structure of the viewed ob¬ 
ject is determined prior to classification using whatever spatial cues are available 
and then this 3D description of the view is matched to 3D prototypes of known 
objects. The other possibility is that some mechanism allows the determination 
of the orientation of the yet-to-be identified object under view. This orientation 
information is used to rotate and project potential 3D descriptions in a way that 
allows a 2D matching of the description and the viewed object. Finally, the last 
option for describing the properties of object classes involves invariant features 
which describe classes of objects in terms of more generic geometric properties, 
particularly those that are likely be be insensitive to different views of the object. 

22.4.2 Size and Distance 

In the absence of more definitive information about depth, objects which project 
onto a larger area of the retina are seen as closer compared with objects which 
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Figure 22.35. Left: perspective and familiar size cues are consistent. Right: perspec¬ 
tive and familiar size cues are inconsistent. Images courtesy Peter Shirley, Scott Kuhl, and 
J. Dylan Lacewell. 


project to a smaller retinal area, an effect called relative size. A more powerful 
cue involves familiar size, which can provide information for absolute distance 
to recognizable objects of known size. The strength of familiar size as a depth 
cue can be seen in illusions such as Figure 22.35, in which it is put in conflict 
with ground-plane, perspective-based depth cues. Familiar size is one part of the 
size-distance relationship, relating the physical size of an object, the optical size 
of the same object projected onto the retina, and the distance of the object from 
the eye (Figure 22.36). 

When objects are sitting on top of a flat ground plane, additional sources for 
depth information become available, particularly when the horizon is either vis- 



Figure 22.36. The size-distance relationship allows the distance to objects of known size 
to be determined based on the visual angle subtended by the object. Likewise, the size of 
an object at a know distance can be determined based on the visual angle subtended by the 
object. 
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Figure 22.37. (a) The horizon ratio can be used to determine depth by comparing the visible 
portion of an object below the horizon to the total vertical visible extent of the object, (b) A 
real-world example. 


ible or can be derived from other perspective information. The angle of decli¬ 
nation to the contact point on the ground is a relative depth cue and provides 
absolute egocentric distance when scaled by eye height, as previously shown in 
Figure 22.25. The horizon ratio , in which the total visible height of an object 
is compared with the visible extent of that portion of the object appearing below 
the horizon, can be used to determine the actual size of objects, even when the 
distance to the objects is not known (Figure 22.37). Underlying the horizon ratio 
is the fact that for a flat ground plane, the line of sight to the horizon intersects 
objects at a position that is exactly an eye height above the ground. 



(a) (b) 


Figure 22.38. (a) Size constancy makes hands positioned at different distances from the 

eye appear to be nearly the same size for real-world viewing, even though the retinal sizes 
are quite different, (b) The effect is less strong when one hand is partially occluded by the 
other, particularly when one eye is closed. Images courtesy Peter Shirley and Pat Moulis. 
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Figure 22.39. Shape 
constancy—the table looks 
rectangular even though its 
shape in the image is an ir¬ 
regular four sided polygon. 


The human visual system is sufficiently able to determine the absolute size of 
most viewed objects; our perception of size is dominated by the the actual physi¬ 
cal size, and we have almost no conscious awareness of the corresponding retinal 
size of objects. This is similar to lightness constancy, discussed earlier, in that 
our perception is dominated by inferred properties of the world, not the low level 
features actually sensed by photoreceptors in the retina. Gregory (1997) describes 
a simple example of size constancy. Hold your two hands out in front of you, one 
at arms length and the other at half that distance away from you (Figure 22.38(a)). 
Your two hands will look almost the same size, even though the retinal sizes differ 
by a factor of two. The effect is much less strong if the nearer hand partially oc¬ 
cludes the more distant hand, particularly if you close one eye (Figure 22.38(b)). 
The visual system also exhibits shape constancy , where the perception of geomet¬ 
ric structure is close to actual object geometry than might be expected given the 
distortions of the retinal image due to perspective (Figure 22.39). 


22.4.3 Events 

Most aspects of event perception are beyond the scope of this chapter, since they 
involve complex non-visual cognitive processes. Three types of event perception 
are primarily visual, however, and are also of clear relevance to computer graph¬ 
ics. Vision is capable of providing information about how a person is moving in 
the world, the existence of independently moving objects in the world, and the 
potential for collisions either due to observer motion or due to objects moving 
towards the observer. 

Vision can be used to determine rotation and the direction of translation rel¬ 
ative to the environment. The simplest case involves movement towards a flat 
surface oriented perpendicularly to the line of sight. Presuming that there is suffi¬ 
cient surface texture to enable the recovery of optic flow, the flow field will form 
a symmetric pattern as shown in Figure 22.40(a). The location in the field of view 
of the focus of expansion of the flow field will have an associated line of sight 
corresponding to the direction of translation. While optic flow can be used to vi¬ 
sually determine the direction of motion, it does not contain enough information 
to determine speed. To see this, consider the situation in which the world is made 
twice as large and the viewer moves twice as fast. The decrease in the magnitude 
of flow values due to the doubling of distances is exactly compensated for by the 
increase in the magnitude of flow values due to the doubling of velocity, resulting 
in an identical flow field. 

Figure 22.40(b) shows the optic flow field resulting from the viewer (or more 
accurately, the viewer’s eyes) rotating around the vertical axis. Unlike the situa- 
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(a) (b) 



Figure 22.40. (a) Movement towards a flat, textured surface produces an expanding flow 

field, with the focus of expansion indicating the line of sight corresponding to the direction 
of motion, (b) The flow field resulting from rotation around the vertical axis while viewing 
a flat surface oriented perpendicularly to the line of sight, (c) The flow field resulting from 
translation parallel to a flat, textured surface. 


tion with respect to translational motion, optic flow provides sufficient informa¬ 
tion to determine both the axis of rotation and the (angular) speed of rotation. The 
practical problem in exploiting this is that the flow resulting from pure rotational 
motion around an axis perpendicular to the line of sight is quite similar to the 
flow resulting from pure translation in the direction that is perpendicular to both 
the line of sight and this rotational axis, making it difficult to visually discriminate 
between the two very different types of motion (Figure 22.40(c)). Figure 22.41 
shows the optical flow patterns generated by movement through a more realistic 
environment. 

If a viewer is completely stationary, visual detection of moving objects is easy, 
since such objects will be associated with the only non-zero optic flow in the field 



Figure 22.41 . The optic flow generated by moving through an otherwise static environment 
provides information about both the motion relative to the environment and the distances to 
points in the environment. In this case, the direction of view is depressed from the horizon, 
but as indicated by the focus of expansion, the motion is parallel to the ground plane. 
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Figure 22.42. Visual 
detection of moving objects 
from a moving observation 
point requires recognizing 
patterns in the optic flow 
that cannot be associated 
with motion through a static 
environment. 


of view. The situation is considerably more complicated when the observer is 
moving, since the visual held will be dominated by non-zero flow, most or all of 
which is due to relative motion between the observer and the static environment 
(Thompson & Pong, 1990). In such cases, the visual system must be sensitive 
to patterns in the optic flow held that are inconsistent with flow fields associated 
with observer movement relative to a static environment (Figure 22.42). 

Section 22.3.4 described how vision can be used to determine time to contact 
with a point in the environment even when the speed of motion is not known. 
Assuming a viewer moving with a straight, constant-speed trajectory and no in¬ 
dependently moving objects in the world, contact will be made with whatever 
surface is in the direction of the line of sight corresponding to the focus of expan¬ 
sion at a time indicated by the r relationship. An independently moving object 
complicate the matter of determining if a collision will in fact occur. Sailors use 
a method for detecting potential collisions that may also be employed in the hu¬ 
man visual system: for non-accelerating straight-line motion, collisions will occur 
with objects that are visually expanding but otherwise remain visually stationary 
in the egocentric frame of reference. 

One form of more complex event perception merits discussion here, since it is 
so important in interactive computer graphics. People are particularly sensitive to 
motion corresponding to human movement. Locomotion can be recognized when 
the only features visible are lights on the walker’s joints (Johansson, 1973). Such 
moving light displays are often even sufficient to recognize properties such as the 
sex of the walker and the weight of the load that the walker may be carrying. 
In computer graphics renderings, viewers will notice even small inaccuracies in 
animated characters, particularly if they are intended to mimic human motion. 

The term visual attention covers a range of phenomenon from where we point 
our eyes to cognitive effects involving what we notice in a complex scene and how 



Figure 22.43. In (a) and (b), visual attention is quickly drawn to the item of different shape 
or color. In (c), sequential search appears to be necessary in order to find the one item that 
differs in both shape and color. 
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we interpret what we notice (Pashler, 1998). Figure 22.43 provides an example of 
how attentional processes affect vision, even for very simple images. In the left 
two panels, the one pattern differing in shape or color from the rest immediately 
“pops out” and is easily noticed. In the panel on the right, the one pattern differ¬ 
ing in both shape and color is harder to find. The reason for this is that the visual 
system can do a parallel search for items distinguished by individual properties, 
but requires more cognitive, sequential search when looking for items that are in¬ 
dicated by the simultaneous presence of two distinguishing features. Graphically 
based human-computer interfaces should be (but often are not!) designed with an 
understanding of how to take advantage of visual attention processes in people so 
as to communicate important information quickly and effectively. 


22.5 Picture Perception 

So far, this chapter has dealt with the visual perception that occurs when the world 
is directly imaged by the human eye. When we view the results of computer 
graphics, of course, we are looking at rendered images and not the real world. 
This has important perceptual implications. In principle, it should be possible to 
generate computer graphics that appears indistinguishable from the real world, at 
least for monocular viewing without either object or observer motion. Imagine 
looking out at the world through a glass window. Now, consider coloring each 
point on the window to exactly match the color of the world originally seen at 
that point. 10 The light reaching the eye is unchanged by this operation, meaning 
that perception should be the same whether the painted glass is viewed or the 
real world is viewed through the window. The goal of computer graphics can be 
thought of as producing the colored window without actually having the equiva¬ 
lent real-world view available. 

The problem for computer graphics and other visual arts is that we can’t in 
practice match a view of the real world by coloring a flat surface. The brightness 
and dynamic range of light in the real world is impossible to recreate using any 
current display technology. Resolution of rendered images is also often less that 
the finest detail perceivable by human vision. Lightness and color constancy are 
much less apparent in pictures than in the real world, likely because the visual 
system attempts to compensate for variability in the brightness and color of the 
illumination based on the ambient illumination in the viewing environment rather 
than the illumination associated with the rendered image. This is why the real- 

l0 This idea was first described by the painter Leon Battista Alberti in 1435 and is now known as 
Alberti’s Window. It is closely related to the camera obscura. 
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istic appearance of color in photographs depends on film color balanced for the 
nature of the light source present when the photograph was taken and why real¬ 
istic color in video requires a white-balancing step. While much is known about 
how limitations in resolution, brightness, and dynamic range affect the detectabil¬ 
ity of simple patterns, almost nothing is known about how these display properties 
affect spatial vision or object identification. 

We have a better understanding of other aspects of this problem, which psy¬ 
chologists refer to as the perception of pictorial space (S. Rogers, 1995). One 
difference between viewing images and viewing the real world is that accommo¬ 
dation, binocular stereo, motion parallax, and perhaps other depth cues may indi¬ 
cate that the surface under view is much different that the distances in the world 
that it is intended to represent. The depths that are seen in such a situation tend 
to be somewhere between the depths indicated by the pictorial cues in the image 
and the distance to the image itself. When looking at a photograph or computer 
display, this often results in a sense of scale smaller than intended. On the other 
hand, seeing a movie in a big-screen theater produces a more compelling sense of 
spaciousness than does seeing the same movie on television, even if the distance 
to the TV is such that the visual angles are the same, since the movie screen is 
farther away. 

Computer graphics rendered using perspective projection has a viewpoint, 
specified as a position and direction in model space, and a view frustum, which 
specifies the horizontal and vertical field of view and several other aspects of the 
viewing transform. If the rendered image is not viewed from the correct location, 
the visual angles to the borders of the image will not match the frustum used in 
creating the image. All visual angles within the image will be distorted as well, 
causing a distortion in all of the pictorial depth and orientation cues based on 
linear perspective. This effect occurs frequently in practice, when a viewer is po¬ 
sitioned either too close or too far away from a photograph or display surface. If 
the viewer is too close, the perspective cues for depth will be compressed, and the 
cues for surface slant will indicate that the surface is closer to perpendicular to the 
line of sight than is actually the case. The situation is reversed if the viewer is too 
far from the photograph or screen. The situation is even more complicated if the 
line of sight does not go through the center of the viewing area, as is commonly 
the case in a wide variety of viewing situations. 

The human visual system is able to partially compensate for perspective dis¬ 
tortions arising from viewing an image at the wrong location, which is why we 
are able to sit in different seats at a movie theater and experience a similar sense 
of the depicted space. When controlling viewing position is particularly impor¬ 
tant, viewing tubes can be used. These are appropriately sized tubes, mounted 
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in a fixed position relative to the display, and through which the viewer sees the 
display. The viewing tube constrains the observation point to the (hopefully) cor¬ 
rect position. Viewing tubes are also quite effective at reducing the conflict in 
depth information between the pictorial cues in the image and the actual display 
surface. They eliminate both stereo and motion parallax, which if present would 
correspond to the display surface, not the rendered view. If they are small enough 
in diameter, they also reduce other cues to the location of the display surface by 
hiding the picture frame or edge of the display device. Exotic visually immersive 
display devices such as head-mounted displays (HMDs) go further in attempting 
to hide visual cues to the position of the display surface while adding binocu¬ 
lar stereo and motion parallax consistent with the geometry of the world being 
rendered. 



Erik Reinhard 


23 



Tone Reproduction 


As discussed in Chapter 22, the human visual system adapts to a wide range of 
viewing conditions. Under normal viewing, we may discern a range of around 4 
to 5 log units of illumination, i.e., the ratio between brightest and darkest areas 
where we can see detail may be as large as 100,000 : 1. Through adaptation 
processes, we may adapt to an even larger range of illumination. We call images 
that are matched to the capabilities of the human visual system high dynamic 
range. 

Visual simulations routinely produce images with a high dynamic range 
(Ward Larson & Shakespeare, 1998). Recent developments in image-capturing 
techniques allow multiple exposures to be aligned and recombined into a single 
high dynamic range image (Debevec & Malik, 1997). Multiple exposure tech¬ 
niques are also available for video. In addition, we expect future hardware to be 
able to photograph or film high dynamic range scenes directly. In general, we 
may think of each pixel as a triplet of three floating point numbers. 

As it is becoming easier to create high dynamic range imagery, the need to 
display such data is rapidly increasing. Unfortunately, most current display de¬ 
vices, monitors and printers, are only capable of displaying around 2 log units 
of dynamic range. We consider such devices to be of low dynamic range. Most 
images in existence today are represented with a byte-per-pixel-per-color chan¬ 
nel, which is matched to current display devices, rather than to the scenes they 
represent. 

Typically, low dynamic range images are not able to represent scenes with¬ 
out loss of information. A common example is an indoor room with an out- 
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Figure 23.1. With conventional photography, some parts of the scene may be under- or 
over-exposed. To visualize the snooker table, the view through the window is burned out in 
the left image. On the other hand, the snooker table will be too dark if the outdoor part of this 
scene is properly exposed. Compare with Figure 23.2, which shows a high dynamic range 
image prepared for display using a tone reproduction algorithm. 


door area visible through the window. Humans are easily able to see details of 
both the indoor part and the outside part. A conventional photograph typically 
does not capture this full range of information—the photographer has to choose 
whether the indoor or the outdoor part of the scene is properly exposed (see Fig¬ 
ure 23.1). These decisions may be avoided by using high dynamic range imaging 
and preparing these images for display using techniques described in this chapter 
(see Figure 23.2). 


There are two strategies available to display high dynamic range images. First, 



Figure 23.2. A high dynamic range im¬ 
age tonemapped for display using a recent 
tone reproduction operator (Reinhard & De¬ 
vlin, 2005). In this image, both the indoor 
part and the view through the window are 
properly exposed. 


we may develop display devices which 
can directly accommodate a high dy¬ 
namic range (Seetzen et al., 2003, 
2004). Second, we may prepare high 
dynamic range images for display on 
low dynamic range display devices (Up- 
still, 1985). This is currently the more 
common approach and the topic of this 
chapter. Although we foresee that high 
dynamic range display devices will be¬ 
come widely used in the (near) future, 
the need to compress the dynamic range 
of an image may diminish, but will not 
disappear. In particular, printed media 
such as this book are by their very na¬ 


ture low dynamic range. 

Compressing the range of values of an image for the purpose of display on 
a low dynamic range display device is called tonemapping or tone reproduction. 
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Figure 23.3. Linear scaling of high dynamic range images to fit a given display device may 
cause significant detail to be lost (left and middle). The left image is linearly scaled. In the 
middle image high values are clamped. For comparison, the right image is tonemapped, 
allowing details in both bright and dark regions to be visible. 


A simple compression function would be to normalize an image (see Figure 23.3 
(left)). This constitutes a linear scaling which tends to be sufficient only if the dy¬ 
namic range of the image is only marginally higher than the dynamic range of the 
display device. For images with a higher dynamic range, small intensity differ¬ 
ences will be quantized to the same display value such that visible details are lost. 
In Figure 23.3 (middle) all pixel values larger than a user-specified maximum are 
set to this maximum (i.e., they are clamped). This makes the normalization less 
dependent on noisy outliers, but here we lose information in the bright areas of 
the image. For comparison. Figure 23.3 (right) is a tonemapped version showing 
detail in both the dark and the bright regions. 

In general linear scaling will not be appropriate for tone reproduction. The 
key issue in tone reproduction is then to compress an image while at the same 
time preserving one or more attributes of the image. Different tone reproduction 
algorithms focus on different attributes such as contrast, visible detail, brightness 
or appearance. 

Ideally, displaying a tonemapped image on a low dynamic range display de¬ 
vice would create the same visual response in the observer as the original scene. 
Given the limitations of display devices, this will not be achievable, although we 
could aim for approximating this goal as closely as possible. 

As an example, we created the high dynamic range image shown in Fig¬ 
ure 23.4. This image was then tonemapped and displayed on a display device. 
The display device itself was then placed in the scene such that it displays its own 
background (Figure 23.5). In the ideal case, the display should appear transpar- 



Figure 23.4. Image used 
for demonstrating the goal 
of tone reproduction in Fig¬ 
ure 23.5. 
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Figure 23.5. After tonemapping the image in Figure 23.4 and displaying it on a monitor, 
the monitor is placed in the scene approximately at the location where the image was taken. 
Dependent on the quality of the tone reproduction operator, the result should appear as if the 
monitor is transparent. 

ent. Dependent on the quality of the tone reproduction operator, as well as the 
nature of the scene being depicted, this goal may be more or less achievable. 

23.1 Classification 

Although it would be possible to classify tone reproduction operators by which 
attribute they aim to preserve, or for which task they were developed, we classify 
algorithms according to their general technique. This will enable us to show the 
differences and similarities between a significant number of different operators, 
and so, hopefully, contribute to the meaningful selection of specific operators for 
given tone reproduction tasks. 

The main classification scheme we follow hinges upon the realization that tone 
reproduction operators are based on insights gained from various disciplines. In 
particular, several operators are based on knowledge of human visual perception. 

The human visual system detects light using photoreceptors located in the 
retina. Light is converted to an electrical signal which is partially processed in 
the retina and then transmitted to the brain. Except for the first few layers of 
cells in the retina, the signal derived from detected light is transmitted using im¬ 
pulse trains. The information-carrying quantity is the frequency with which these 
electrical pulses occur. 

The range of light that the human visual system can detect is much larger 
than the range of frequencies employed by the human brain to transmit infor¬ 
mation. Thus, the human visual system effortlessly solves the tone reproduc¬ 
tion problem—a large range of luminances is transformed into a small range of 
frequencies of impulse trains. Emulating relevant aspects of the human visual 
system is therefore a worthwhile approach to tone reproduction; this approach is 
explained in more detail in Section 23.7. 
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A second class of operators is grounded in physics. Light interacts with sur¬ 
faces and volumes before being absorbed by the photoreceptors. In computer 
graphics, light interaction is generally modelled by the rendering equation. For 
purely diffuse surfaces, this equation may be simplified to the product between 
light incident upon a surface (illuminance), and this surface’s ability to reflect 
light (reflectance) (Oppenheimet al., 1968). 

Since reflectance is a passive property of surfaces, for diffuse surfaces it is, 
by definition, low dynamic range—typically between 0.005 and 1 (Stockham, 
1972). The reflectance of a surface cannot be larger than 1, since then it would 
reflect more light than was incident upon the surface. Illuminance, on the other 
hand, can produce arbitrarily large values and is limited only by the intensity and 
proximity of the light sources. 

The dynamic range of an image is thus predominantly governed by the illu¬ 
minance component. In the face of diffuse scenes, a viable approach to tone re¬ 
production may therefore be to separate reflectance from illuminance, compress 
the illuminance component, and then recombine the image. 

However, the assumption that all surfaces in a scene are diffuse is generally 
incorrect. Many high dynamic range images depict highlights and/or directly 
visible light sources (Figure 23.3). The luminance reflected by a specular surface 
may be almost as high as the light source it reflects. 

Various tone reproduction operators currently used split the image into a high 
dynamic range base layer and a low dynamic range detail layer. These layers 
would represent illuminance and reflectance if the depicted scene were entirely 
diffuse. For scenes containing directly visible light sources or specular highlights, 
separation into base and detail layers still allows the design of effective tone re¬ 
production operators, although no direct meaning can be attached to the separate 
layers. Such operators are discussed in Section 23.5. 


23.2 Dynamic Range 

Conventional images are stored with one byte per pixel for each of the red, green 
and blue components. The dynamic range afforded by such an encoding depends 
on the ratio between smallest and largest representable value, as well as the step 
size between successive values. Thus, for low dynamic range images, there are 
only 256 different values per color channel. 

High dynamic range images encode a significantly larger set of possible val¬ 
ues; the maximum representable value may be much larger and the step size be¬ 
tween successive values may be much smaller. The file size of high dynamic 
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Figure 23.6. Dynamic 
range of 2.65 log 2 units. 



Figure 23.7. Dynamic 
range of 3.96 log 2 units. 

-f 




Figure 23.8. Dynamic 
range of 4.22 log 2 units. 



Figure 23.9. Dynamic 
range of 5.01 log 2 units. 



Figure 23.10. Dynamic 
range of 6.56 log 2 units. 


range images is therefore generally larger as well, although at least one standard 
(the OpenEXR high dynamic range file format (Kainz et al., 2003)) includes a 
very capable compression scheme. 

A different approach to limit file sizes is to apply a tone reproduction operator 
to the high dynamic data. The result may then be encoded in JPEG format. In 
addition, the input image may be divided pixel-wise by the tonemapped image. 
The result of this division can then be subsampled and stored as a small amount of 
data in the header of the same JPEG image (G. Ward & Simmons, 2004). The file 
size of such sub-band encoded images is of the same order as conventional JPEG 
encoded images. Display programs can display the JPEG image directly or may 
reconstruct the high dynamic range image by multiplying the tonemapped image 
with the data stored in the header. 

In general, the combination of smallest step size and ratio of the smallest and 
largest representable values determines the dynamic range that an image encoding 
scheme affords. For computer-generated imagery, an image is typically stored as 
a triplet of floating point values before it is written to file or displayed on screen, 
although more efficient encoding schemes are possible (Reinhard et al., 2005). 
Since most display devices are still fitted with eight-bit D/A converters, we may 
think of tone reproduction as the mapping of floating point numbers to bytes such 
that the result is display able on a low dynamic range display device. 

The dynamic range of individual images is generally smaller, and is deter¬ 
mined by the smallest and largest luminances found in the scene. A simplistic 
approach to measure the dynamic range of an image may therefore compute the 
ratio between the largest and smallest pixel value of an image. Sensitivity to out¬ 
liers may be reduced by ignoring a small percentage of the darkest and brightest 
pixels. 

Alternatively, the same ratio may be expressed as a difference in the logarith¬ 
mic domain. This measure is less sensitive to outliers. The images shown in the 
margin on this page are examples of images with different dynamic ranges. Note 
that the night scene in this case does not have a smaller dynamic range than the 
day scene. While all the values in the night scene are smaller, the ratio between 
largest and smallest values is not. 

However, the recording device or rendering algorithm may introduce noise 
which will lower the useful dynamic range. Thus, a measurement of the dynamic 
range of an image should factor in noise. A better measure of dynamic range 
would therefore be a signal-to-noise ratio, expressed in decibels, as used in signal 
processing. 
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Figure 23.11. Per-channel gamma correction may desaturate the image. The left image 
was desaturated with a value of s = 0.5. The right image was not desaturated (s = 1). (See 
also Plate XIV) 


23.3 Color 


Tone reproduction operators normally compress luminance values, rather than 
work directly on the red, green, and blue components of a color image. Af¬ 
ter these luminance values have been compressed into display values Ld(x,y), 
a color image may be reconstructed by keeping the ratios between color channels 
the same as they were before compression (using s = 1) (Schlick, 1994b): 





The results frequently appear over-saturated, because human color perception is 
non-linear with respect to overall luminance level. This means that if we view 
an image of a bright outdoor scene on a monitor in a dim environment, our eyes 
are adapted to the dim environment rather than the outdoor lighting. By keeping 
color ratios constant, we do not take this effect into account. 

Alternatively, the saturation constant s may be chosen smaller than one. Such 
per-channel gamma correction may desaturate the results to an appropriate level, 
as shown in Figure 23.11 and Plate XIV (Fattal et al., 2002). A more compre¬ 
hensive solution is to incorporate ideas from the field of color appearance model¬ 
ing into tone reproduction operators (Pattanaik et al., 1998; Fairchild & Johnson, 
2004; Reinhard & Devlin, 2005). 

Finally, if an example image with a representative color scheme is already 
available, this color scheme may be applied to a new image.Such a mapping of 
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colors between images may be used for subtle color correction such as saturation 
adjustment or for more creative color mappings. The mapping proceeds by con¬ 
verting both source and target images to a decorrelated color space. In such a 
color space, the pixel values in each color channel may be treated independently 
without introducing too many artifacts (Reinhard et al., 2001). 

Mapping colors from one image to another in a decorrelated color space is 
then straightforward: compute the mean and standard deviation of all pixels in the 
source and target images for the three color channels separately. 

Then, shift and scale the target image 
so that in each color channel the mean 
and standard deviation of the target im¬ 
age is the same as the source image. 
The resulting image is then obtained by 
converting from the decorrelated color 
space to RGB and clamping negative 
pixels to zero. The dynamic range of 
the image may have changed as a re¬ 
sult of applying this algorithm. It is 
therefore recommended to apply this al¬ 
gorithm on high dynamic range images 
and apply a conventional tone reproduc¬ 
tion algorithm afterwards. A suitable decorrelated color space is the opponent 
space from Section 21.2.4. 

The result of applying such a color transform to the image in Figure 23.12 is 
shown in Figure 23.13. 



Figure 23.12. Image used for demonstrat¬ 
ing the color transfer technique. Results are 
shown in Figures 23.13 and 23.31. (See 
also Plates XV, XVI and XVIII.) 



Figure 23.13. The image on the left is used to adjust the colors of the image shown in 
Figure 23.12. The result is shown on the right. (See also Plate XVI.) 
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23.4 Image Formation 

For now we assume that an image is formed as the result of light being diffusely 
reflected off of surfaces. Later in this chapter we relax this constraint to scenes 
directly depicting light sources and highlights. The luminance L v of each pixel is 
then approximated by the following product: 

L v {x,y) = r(x,y) E v (x,y). 

Here, r denotes the reflectance of a surface, and E v denotes the illuminance. The 
subscript v indicates that we are using photometrically weighted quantities. Al¬ 
ternatively, we may write this expression in the logarithmic domain (Oppenheim 
et al„ 1968): 


D{x,y) = log (L v (x,y)) 

= log (r(x,y) E v (x, y)) 

= log (r(x,y)) +log (E v (x,y)). 

Photographic transparencies record images by varying the density of the material. 
In traditional photography, this variation has a logarithmic relation with lumi¬ 
nance. Thus, in analogy with common practice in photography, we will use the 
term density representation (D) for log luminance. When represented in the log 
domain, reflectance and illuminance become additive. This facilitates separation 
of these two components, despite the fact that isolating either reflectance or il¬ 
luminance is an under-constrained problem. In practice, separation is possible 
only to a certain degree and depends on the composition of the image. Nonethe¬ 
less, tone reproduction could be based on disentangling these two components of 
image formation, as shown in the following two sections. 


23.5 Frequency-Based Operators 

For typical diffuse scenes, the reflectance component tends to exhibit high spatial 
frequencies due to textured surfaces as well as the presence of surface edges. On 
the other hand, illuminance tends to be a slowly varying function over space. 

Since reflectance is low dynamic range and illuminance is high dynamic range, 
we may try to separate the two components. The frequency-dependence of both 
reflectance and illuminance provides a solution. We may for instance compute 
the Fourier transform of an image and attenuate only the low frequencies. This 
compresses the illuminance component while leaving the reflectance component 
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Figure 23.14. Bilateral filtering removes small details but preserves sharp gradients (left). 
The associated detail layer is shown on the right. 


largely unaffected—the very first digital tone reproduction operator known to us 
takes this approach (Oppenheimet ah, 1968). 

More recently, other operators have also followed this line of reasoning. In 
particular, bilateral and trilateral filters were used to separate an image into base 
and detail layers (Durand & Dorsey, 2002; Choudhury & Tumblin, 2003). Both 
filters are edge-preserving smoothing operators which may be used in a variety of 
different ways. Applying an edge-preserving smoothing operator to a density im¬ 
age results in a blurred image in which sharp edges remain present (Figure 23.14 
(left)). We may view such an image as a base layer. If we then pixel-wise divide 
the high dynamic range image by the base layer, we obtain a detail layer which 
contains all the high frequency detail (Figure 23.14 (right)). 

For diffuse scenes, base and detail layers are similar to representations of 
illuminance and reflectance. For images depicting highlights and light sources, 

this parallel does not hold. However, 
separation of an image into base and 
detail layers is possible regardless of 
the image’s content. By compressing 
the base layer before recombining into 
a compressed density image, a low dy¬ 
namic range density image may be cre¬ 
ated (Figure 23.15). After exponentia¬ 
tion, a displayable image is obtained. 

Edge-preserving smoothing opera¬ 
tors may also be used to compute a local 
adaptation level for each pixel, which 
may be used in a spatially varying or local tone reproduction operator. We de¬ 
scribe this use of bilateral and trilateral filters in Section 23.7. 



Figure 23.15. An image tonemapped using 
bilateral filtering. The base and detail layers 
shown in Figure 23.14 are recombined after 
compressing the base layer. 
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Figure 23.16. The image on the left (tonemapped using gradient-domain compression) 
shows a scene with highlights. These highlights show up as large gradients on the right, 
where the magnitude of the gradients is mapped to a grayscale (black is a gradient of 0, 
white is the maximum gradient in the image). 

23.6 Gradient-Domain Operators 

The arguments made for the frequency-based operators in the preceding section 
also hold for the gradient field. Assuming that no light sources are directly visible, 
the reflectance component will be a constant function with sharp spikes in the 
gradient field. Similarly, the illuminance component will cause small gradients 
everywhere. 

Humans are generally able to separate illuminance from reflectance in typical 
scenes. The perception of surface reflectance after discounting the illuminant is 
called lightness. To assess the lightness of an image depicting only diffuse sur¬ 
faces, B. K. P. Horn was the first to separate reflectance and illuminance using a 
gradient field (Horn, 1974). He used simple thresholding to remove all small gra¬ 
dients and then integrated the image, which involves solving a Poisson equation 
using the Full Multigrid Method (Press et al., 1992). 

The result is similar to an edge-preserving smoothing filter. This is accord¬ 
ing to expectation since Oppenheim’s frequency-based operator works under the 
same assumptions of scene reflectivity and image formation. In particular, Horn’s 
work was directly aimed at “mini-worlds of Mondrians,” which are simplified 
versions of diffuse scenes which resemble the abstract paintings by the famous 
Dutch painter Piet Mondrian. 

Horn’s work cannot be employed directly as a tone reproduction operator, 
since most high dynamic range images depict light sources. However, a relatively 
small variation will turn this work into a suitable tone reproduction operator. If 
light sources or specular surfaces are depicted in the image, then large gradients 
will be associated with the edges of light sources and highlights. These cause the 
image to have a high dynamic range. An example is shown in Figure 23.16, where 
the highlights on the snooker balls cause sharp gradients. 
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We could therefore compress a high 
dynamic range image by attenuating 
large gradients, rather than threshold¬ 
ing the gradient field. This approach 
was taken by Fattal et al. who showed 
that high dynamic range imagery may 
be successfully compressed by integrat¬ 
ing a compressed gradient field (Fig¬ 
ure 23.17) (Fattal et al., 2002). Fat- 

tars gradient-domain compression is 
Figure23.17. An imagetonemapped using . , ,. nr 

gradient-domain compression. not limited to diffuse scenes. 


23.7 Spatial Operators 

In the following sections, we discuss tone reproduction operators which apply 
compression directly on pixels without transformation to other domains. Often 
global and local operators are distinguished. Tone reproduction operators in the 
former class change each pixel’s luminance values according to a compressive 
function which is the same for each pixel. The term global stems from the fact that 
many such functions need to be anchored to some values determined by analyzing 
the full image. In practice, most operators use the geometric average L v to steer 
the compression: 


L v = exp | ^ log(<5 + L v (x, y) j . (23.1) 

In Equation (23.1), a small constant 5 is introduced to prevent the average to be¬ 
come zero in the presence of black pixels. The geometric average is normally 
mapped to a predefined display value. The effect of mapping the geometric aver¬ 
age to different display values is shown in Figure 23.18. Alternatively, sometimes 
the minimum or maximum image luminance is used. The main challenge faced 
in the design of a global operator lies in the choice of the compressive function. 

On the other hand, local operators compress each pixel according to a specific 
compression function which is modulated by information derived from a selection 
of neighboring pixels, rather than the full image. The rationale is that a bright 
pixel in a bright neighborhood may be perceived differently than a bright pixel in 
a dim neighborhood. Design challenges in the development of a local operator 
involves choosing the compressive function, the size of the local neighborhood 
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Figure 23.18. Spatial tonemapping operator applied after mapping the geometric average 
to different display values (left: 0.12, right: 0.38). 


for each pixel, and the manner in which local pixel values are used. In general, 
local operators achieve better compression than global operators (Figure 23.19), 
albeit at a higher computational cost. 

Both global and local operators are often inspired by the human visual sys¬ 
tem. Most operators employ one of two distinct compressive functions, which 
is orthogonal to the distinction between local and global operators. Display val¬ 
ues Ld{x , y) are most commonly derived from image luminances L v (x : y) by the 



Figure 23.19. A global tone reproduction operator (left) and a local tone reproduction 
operator (right) (Reinhard et al., 2002) of each image. The local operator shows more detail; 
for example the metal badge on the right shows better contrast and the highlights are crisper. 
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following two functional forms: 


Ld{x,y) 


L v {x,y) 
f{x,y) ’ 


(23.2) 


Ld(x, y) 


L v {x, y) 


(23.3) 


L v {x,y) + f n (x,y)' 


In these equations, f(x, y) may either be a constant or a function which varies per 
pixel. In the former case, we have a global operator, whereas a spatially varying 
function /(x, y) results in a local operator. The exponent n is usually a constant 
which is fixed for a particular operator. 

Equation (23.2) divides each pixel’s luminance by a value derived from either 
the full image or a local neighborhood. Equation (23.3) has an S-shaped curve on 
a log-linear plot and is called a sigmoid for that reason. This functional form fits 
data obtained from measuring the electrical response of photoreceptors to flashes 
of light in various species. In the following sections, we discuss both functional 
forms. 


23.8 Division 


Each pixel may be divided by a constant to bring the high dynamic range image 
within a displayable range. Such a division essentially constitutes linear scaling, 
as shown in Figure 23.3. While Figure 23.3 shows ad-hoc linear scaling, this 
approach may be refined by employing psychophysical data to derive the scaling 
constant /(x, y) = k in Equation (23.2) (G. J. Ward, 1994; Ferwerda et al., 1996). 

Alternatively, several approaches exist that compute a spatially varying di¬ 
visor. In each of these cases, /(x, y) is a blurred version of the image, i.e., 
f(x,y) = L“ ur (x,j/). The blur is achieved by convolving the image with a 
Gaussian filter (Chiu et al., 1993; Rahman et al., 1996). In addition, the computa¬ 
tion of /(x, y) by blurring the image may be combined with a shift in white point 
for the purpose of color appearance modeling (Fairchild & Johnson, 2002; G. M. 
Johnson & Fairchild, 2003; Fairchild & Johnson, 2004). 

The size and the weight of the Gaussian filter has a profound impact on the 
resulting displayable image. The Gaussian filter has the effect of selecting a 
weighted local average. Tone reproduction is then a matter of dividing each pixel 
by its associated weighted local average. If the size of the filter kernel is chosen 
too small, then haloing artifacts will occur (Figure 23.20 (left)). Haloing is a com¬ 
mon problem with local operators and is particularly evident when tone mapping 
relies on division. 
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Figure 23.20. Images tonemapped by dividing by Gaussian blurred versions. The size 
of the filter kernel is 64 pixels for the left image and 512 pixels for the right image. For 
division-based algorithms, halo artifacts are minimized by choosing large filter kernels. 


In general, haloing artifacts may be minimized in this approach by making the 
filter kernel large (Figure 23.20 (right)). Reasonable results may be obtained by 
choosing a filter size of at least one quarter of the image. Sometimes even larger 
filter kernels are desirable to minimize artifacts. Note, that in the limit, the filter 
size becomes as large as the image itself. In that case the local operator becomes 
global, and the extra compression normally afforded by a local approach is lost. 

The functional form whereby each pixel is divided by a Gaussian blurred pixel 
at the same spatial position thus requires an undesirable tradeoff between amount 
of compression and severity of artifacts. 


23.9 Sigmoids 

Equation (23.3) follows a different functional form from simple division, and, 
therefore, affords a different tradeoff between amount of compression, presence 
of artifacts, and speed of computation. 

Sigmoids have several desirable properties. For very small luminance values, 
the mapping is approximately linear, so that contrast is preserved in dark areas of 
the image. The function has an asymptote at one, which means that the output 
mapping is always bounded between 0 and 1. 

In Equation (23.3), the function f(x,y) may be computed as a global con¬ 
stant or as a spatially varying function. Following common practice in electro¬ 
physiology, we call f{x,y) the semi-saturation constant. Its value determines 
which values in the input image are optimally visible after tonemapping. In par¬ 
ticular, if we assume that the exponent n equals 1, then luminance values equal 
to the semi-saturation constant will be mapped to 0.5. The effect of choosing 
different semi-saturation constants is shown in Figure 23.21. 
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Figure 23.21. The choice of semi-saturation constant determines how input values are 
mapped to display values. 

The function f(x, y) may be computed in several different ways (Reinhard 
et al., 2005). In its simplest form, f(x,y) is set to L v /k, so that the geometric 
average is mapped to user parameter k (Figure 23.22) (Reinhard et al., 2002). In 
this case, a good initial value for k is 0.18, although for particularly bright or dark 
scenes this value may be raised or lowered. Its value may be estimated from the 
image itself (Reinhard, 2003). The exponent n in Equation (23.3) may be set to 1. 

In this approach, the semi-saturation constant is a function of the geometric 
average, and the operator is therefore global. A variation of this global opera- 



Figure 23.22. A linearly scaled image (left) and an image tonemapped using sigmoidal 
compression (right). 
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Figure 23.23. Linear interpolation varies contrast in the tonemapped image. The parameter 
a is set to 0.0 in the left image, and to 1.0 in the right image. 


tor computes the semi-saturation constant by linearly interpolating between the 
geometric average and each pixel's luminance: 

fix, y) = a L v (x, y) + (1 - a) L v . 


The interpolation is governed by user parameter a which has the effect of vary¬ 
ing the amount of contrast in the displayable image (Figure 23.23) (Reinhard & 
Devlin, 2005). More contrast means less visible detail in the light and dark areas 
and vice versa. This interpolation may be viewed as a half-way house between a 
fully global and a fully local operator by interpolating between the two extremes 
without resorting to expensive blurring operations. 

Although operators typically compress luminance values, this particular op¬ 
erator may be extended to include a simple form of chromatic adaptation. It thus 
presents an opportunity to adjust the level of saturation normally associated with 
tonemapping, as discussed at the beginning of this chapter. 

Rather than compress the luminance channel only, sigmoidal compression is 
applied to each of the three color channels: 


Ir,d{ X >y) 

!gA x ,y) 

hA x >y) 


Ir( X ,V) 

Ir(x,y) + f n {x,y)' 

I g{x,y) 

Ig(x,y) + f n {x,y)’ 

h(x,y) 

h{x, y) + f n {x,y)' 


The computation of f(x, y) is also modified to bilinearly interpolate between the 
geometric average luminance and pixel luminance and between each independent 
color channel and the pixel’s luminance value. We therefore compute the geo¬ 
metric average luminance value L v , as well as the geometric average of the red, 
green and blue channels ( I r , I g , and /{,). From these values, we compute /( x, y) 
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Figure 23.24. Linear interpolation for color correction. The parameter c is set to 0.0 in the 
left image, and to 1.0 in the right image. (See also Plate XVII.) 

for each pixel and for each color channel independently. We show the equation 
for the red channel ( f r (x , y)): 


G r {x,y) = cl r (x, y) + (1 - c) L v (x,y), 
G r {x,y) = c~I r + (1 - c) L v , 
fr(x,y) = a G r (x, y) + (1 - a) G r (x,y). 


The interpolation parameter a steers the amount of contrast as before, and the new 
interpolation parameter c allows a simple form of color correction (Figure 23.24 
and Plate XVII). 

So far we have not discussed the value of the exponent n in Equation (23.3). 
Studies in electrophysiology report values between n = 0.2 and n = 0.9 (Hood et 
al., 1979). While the exponent may be user-specified, for a wide variety of images 
we may estimate a reasonable value from the geometric average luminance L v and 
the minimum and maximum luminance in the image (L m - ln and L max ) with the 
following empirical equation: 


1 A 



The several variants of sigmoidal compression shown so far are all global in na¬ 
ture. This has the advantage that they are fast to compute, and they are very 
suitable for medium to high dynamic range images. For very high dynamic range 
images, it may be necessary to resort to a local operator, since this may give some 
extra compression. A straightforward method to extend sigmoidal compression 
replaces the global semi-saturation constant by a spatially varying function, which 
may be computed in several different ways. 

In other words, the function f(x, y) is so far assumed to be constant, but may 
also be computed as a spatially localized average. Perhaps the simplest way to 
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accomplish this is to once more use a Gaussian blurred image. Each pixel in 
a blurred image represents a locally averaged value which may be viewed as a 
suitable choice for the semi-saturation constant 1 . 

As with division-based operators discussed in the previous section, we have 
to consider haloing artifacts. However, when an image is divided by a Gaussian 
blurred version of itself, the size of the Gaussian filter kernel needs to be large 
in order to minimize halos. If sigmoids are used with a spatially variant semi¬ 
saturation constant, the Gaussian filter kernel needs to be made small in order 
to minimize artifacts. This is a significant improvement, since small amounts of 
Gaussian blur may be efficiently computed directly in the spatial domain. In other 
words, there is no need to resort to expensive Fourier transforms. In practice, filter 
kernels of only a few pixels width are sufficient to suppress significant artifacts 
while at the same time producing more local contrast in the tonemapped images. 

One potential issue with Gaussian blur is that the filter blurs across sharp 
contrast edges in the same way that it blurs small details. In practice, if there 

is a large contrast gradient in the neigh¬ 
borhood of the pixel under considera¬ 
tion, this causes the Gaussian-blurred 
pixel to be significantly different from 
the pixel itself. This is the direct cause 
for halos. By using a very large fil¬ 
ter kernel in a division-based approach, 
such large contrasts are averaged out. 

In sigmoidal compression schemes, 
a small Gaussian filter minimizes the 
chances of overlapping with a sharp 
contrast gradient. In that case, halos 
still occur, but their size is such that they 
usually go unnoticed and instead are perceived as enhancing contrast. 

Another way to blur an image, while minimizing the negative effects of nearby 
large contrast steps, is to avoid blurring over such edges. A simple, but compu¬ 
tationally expensive way, is to compute a stack of Gaussian-blurred images with 
different kernel sizes. For each pixel, we may choose the largest Gaussian that 
does not overlap with a significant gradient. 

In a relatively uniform neighborhood, the value of a Gaussian-blurred pixel 
should be the same regardless of the filter kernel size. Thus, the difference be¬ 
tween a pixel filtered with two different Gaussians should be approximately zero. 

'Although f(x,y) is now no longer a constant, we continue to refer to it as the semi-saturation 
constant. 



Figure 23.25. Example image used to 
demonstrate the scale selection mechanism 
shown in Figure 23.26. 
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Figure 23.26. Scale selection mechanism: the left image shows the scale selected for each 
pixel of the image shown in Figure 23.25; the darker the pixel, the smaller the scale. A total 
of eight different scales were used to compute this image. The right image shows the local 
average computed for each pixel on the basis of the neighborhood selection mechanism. 


This difference will only change significantly if the wider filter kernel overlaps 
with a neighborhood containing a sharp contrast step, whereas the smaller filter 
kernel does not. 

It is possible, therefore, to find the largest neighborhood around a pixel that 
does not contain sharp edges by examining differences of Gaussians at different 
kernel sizes. For the image shown in Figure 23.25, the scale selected for each pixel 
is shown in Figure 23.26 (left). Such a scale selection mechanism is employed by 
the photographic tone reproduction operator (Reinhard et al., 2002) as well as in 
Ashikhmin’s operator (Ashikhmin, 2002). 

Once the appropriate neighborhood for each pixel is known, the Gaussian 
blurred average Tbim for this neighborhood (shown on the right of Figure 23.26) 
may be used to steer the semi-saturation constant, such as for instance employed 
by the photographic tone reproduction operator: 

L — 

1. + Tblur 

An alternative, and arguably better, approach is to employ edge-preserving 
smoothing operators, which are designed specifically for removing small details 
while keeping sharp contrasts in tact. Several such filters, such as the bilateral fil¬ 
ter (Figure 23.27), trilateral filter, Susan filter, the LCIS algorithm and the mean 
shift algorithm are suitable, although some of them are expensive to compute (Du¬ 
rand & Dorsey, 2002; Choudhury & Tumblin, 2003; Pattanaik & Yee, 2002; Tum- 
blin & Turk, 1999; Comaniciu & Meer, 2002). 


23.10 Other Approaches 

Although the previous sections together discuss most tone reproduction operators 
to date, there are one or two operators that do not directly fit into the above cate- 
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Figure 23.27. Sigmoidal compression (left) and sigmoidal compression using bilateral 
filtering to compute the semi-saturation constant (right). Note the improved contrast in the 
sky in the right image. 


gories. The simplest of these are variations of logarithmic compression, and the 
other is a histogram-based approach. 

Dynamic range reduction may be accomplished by taking the logarithm, pro¬ 
vided that this number is greater than 1. Any positive number may then be non- 
linearly scaled between 0 and 1 using the following equation: 


Ld{x,y) 


l°gfe(l + L v (x, y)) 
l0g 6 (l + i max ) 


While the base b of the logarithm above is not specified, any choice of base will 
do. This freedom to choose the base of the logarithm may be used to vary the 
base with input luminance, and thus achieve an operator that is better matched to 
the image being compressed (Drago et al., 2003). This method uses Perlin and 
Hoffert’s bias function which takes user parameter p (Perlin & Hoffert, 1989): 


biasp (x) = ^Sioto/^SioCVa). 



Figure 23.28. Logarithmic compression using base 10 logarithms (left) and logarithmic 
compression with varying base (right). 
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Making the base b dependent on luminance and smoothly interpolating bases be¬ 
tween 2 and 10, the logarithmic mapping above may be refined: 


Ld{x,y) 


l°gio(l + L v (x, y)) 

l°gio(l + •^max) 



L v {x,y)V OSloip)/l ° 6lo{1/2)S 


For user parameter^, an initial value of around 0.85 tends to yield plausible results 
(Figure 23.28 (right)). 

Alternatively, tone reproduction may be based on histogram equalization. Tra¬ 
ditional histogram equalization aims to give each luminance value equal probabil¬ 
ity of occurrence in the output image. Greg Ward refines this method in a manner 
that preserves contrast (Ward Larson et al., 1997). 

First, a histogram is computed from the luminances in the high dynamic range 
image. From this histogram, a cumulative histogram is computed such that each 
bin contains the number of pixels that have a luminance value less than or equal 
to the luminance value that the bin represents. The cumulative histogram is a 
monotonically increasing function. Plotting the values in each bin against the 
luminance values represented by each bin therefore yields a function which may 
be viewed as a luminance mapping function. Scaling this function, such that the 
vertical axis spans the range of the display device, yields a tone reproduction 
operator. This technique is called histogram equalization. 

Ward further refined this method by ensuring that the gradient of this function 
never exceeds 1. This means, that if the difference between neighboring values 
in the cumulative histogram is too large, this difference is clamped to 1. This 
avoids the problem that small changes in luminance in the input may yield large 
differences in the output image. In other words, by limiting the gradient of the 
cumulative histogram to 1, contrast is never exaggerated. The resulting algorithm 
is called histogram adjustment (see Figure 23.29). 



Figure 23.29. A linearly scaled image (left) and a histogram adjusted image (right). Image 
created with the kind permission of the Albin Polasek museum, Winter Park, Florida. 
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23.11 Night Tonemapping 


The tone reproduction operators discussed so far nearly all assume that the im¬ 
age represents a scene under photopic viewing conditions, i.e., as seen at normal 
light levels. For scotopic scenes, i.e., very dark scenes, the human visual system 
exhibits distinctly different behavior. In particular, perceived contrast is lower, 
visual acuity (i.e., the smallest detail that we can distinguish) is lower, and every¬ 
thing has a slightly blue appearance. 

To allow such images to be viewed correctly on monitors placed in photopic 
lighting conditions, we may preprocess the image such that it appears as if we 
were adapted to a very dark viewing environment. Such preprocessing frequently 
takes the form of a reduction in brightness and contrast, desaturation of the image, 
blue shift, and a reduction in visual acuity (Thompson et al., 2002). 

A typical approach starts by converting the image from RGB to XYZ. Then, 
scotopic luminance V may be computed for each pixel: 


V = Y 


1.33 



y + z\ 
X ) 


1.68 


This single channel image may then be scaled and multiplied by an em¬ 
pirically chosen bluish gray. An example is shown in Figure 23.30. If some 

pixels are in the photopic range, then 
the night image may be created by lin¬ 
early blending the bluish gray image 
with the input image. The fraction to 
use for each pixel depends on V. 

Loss of visual acuity may be mod¬ 
elled by low-pass filtering the night im¬ 
age, although this would give an incor¬ 
rect sense of blurriness. A better ap¬ 
proach is to apply a bilateral filter to re¬ 
tain sharp edges while blurring smaller 
details (Tomasi & Manduchi, 1998). 
Finally, the color transfer technique 
outlined in Section 23.3 may also be used to transform a day-lit image into a 
night scene. The effectiveness of this approach depends on the availability of a 
suitable night image from which to transfer colors. As an example, the image in 
Figure 23.12 is transformed into a night image in Figure 23.31. 



Figure 23.30. Simulated night scene using 
the image shown in Figure 23.12. (See also 
Plates XV and XIX.) 
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23.12 Discussion 

Since global illumination algorithms naturally produce high dynamic range im¬ 
ages, direct display of the resulting images is not possible. Rather than resort to 
linear scaling or clamping, a tone reproduction operator should be used. Any tone 
reproduction operator is better than using no tone reproduction. Dependent on the 
requirements of the application, one of several operators may be suitable. 

For instance, real-time rendering applications should probably resort to a sim¬ 
ple sigmoidal compression, since these are fast enough to also run in real time. 
In addition, their visual quality is often good enough. The histogram adjustment 
technique (Ward Larson et al., 1997) may also be fast enough for real-time oper¬ 
ation . 

For scenes containing a very high dynamic range, better compression may 
be achieved with a local operator. However, the computational cost is frequently 
substantially higher, leaving these operators suitable only for non-interactive ap¬ 
plications. Among the fastest of the local operators is the bilateral filter due to the 
optimizations afforded by this technique (Durand & Dorsey, 2002). 

This filter is interesting as a tone reproduction operator by itself, or it may 
be used to compute a local adaptation level for use in a sigmoidal compression 
function. In either case, the filter respects sharp contrast changes and smoothes 
over smaller contrasts. This is an important feature that helps minimize halo 
artifacts which are a common problem with local operators. 

An alternative approach to minimize halo artifacts is the scale selection mech¬ 
anism used in the photographic tone reproduction operator (Reinhard et al., 2002), 
although this technique is slower to compute. 

In summary, while a large number of tone reproduction operators is cur¬ 
rently available, only a small number of fundamentally different approaches exist. 
Fourier-domain and gradient-domain operators are both rooted in knowledge of 




23.12. Discussion 


621 


image formation. Spatial-domain operators are either spatially variant (local) or 
global in nature. These operators are usually based on insights gained from study¬ 
ing the human visual system (and the visual system of many other species). 




Global Illumination 


Many surfaces in the real world receive most or all of their incident light from 
other reflective surfaces. This is often called indirect lighting or mutual illumi¬ 
nation. For example, the ceilings of most rooms receive little or no illumination 
directly from luminaires (light emitting objects). The direct and indirect compo¬ 
nents of illumination are shown in Figure 24.1. 

Although accounting for the interreflection of light between surfaces is 
straightforward, it is potentially costly because all surfaces may reflect any given 
surface, resulting in as many as 0(N 2 ) interactions for N surfaces. Because the 
entire global database of objects may illuminate any given object, accounting for 
indirect illumination is often called the global illumination problem. 

There is a rich and complex literature on solving the global illumination prob¬ 
lem (e.g., (Appel, 1968; Goral et al., 1984; Cook et al., 1984; Immel et al., 1986; 



Figure 24.1. In the left and middle images, the indirect and direct lighting, respectively, 
are separated out. On the right, the sum of both components is shown. Global illumination 
algorithms account for both the direct and the indirect lighting. 


623 















624 


24. Global Illumination 


Kajiya, 1986; Malley, 1988)). In this chapter we discuss two algorithms as exam¬ 
ples: particle tracing and path tracing. The first is useful for walkthrough appli¬ 
cations such as maze games, and as a component of batch rendering. The second 
is useful for realistic batch rendering. Then we discuss separating out “direct” 
lighting where light takes exactly once bounce between luminaire and camera. 


24.1 Particle Tracing for Lambertian Scenes 


Recall the transport equation from Section 20.2: 

L a ( k G ) = / p(k. i ,k 0 )L/(k i )cos6>id(Ji. 

J all k i 

The geometry for this equation is shown Figure 24.2. When the illuminated point 
is Lambertian, this equation reduces to: 


R 


j 


Lf(ki) cos Oidcri , 


where R is the diffuse reflectance. One way to approximate the solution to this 
equation is to use finite element methods. First, we break the scene into N sur¬ 
faces each with unknown surface radiance L, : , reflectance II ,, and emitted radi¬ 
ance Ei. This results in the set of N simultaneous linear equations 


AT 



where kij is a constant related to the original integral representation. We then 
solve this set of linear equations, and we can render N constant-colored polygons. 
This finite element approach is often called radiosity. 



Figure 24.2. The geometry for the transport equation in its directional form. 
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An alternative method to radiosity is to use a statistical simulation approach by 
randomly following light “particles” from the luminaire though the environment. 
This is a type of particle tracing. There are many algorithms that use some form 
of particle tracing; we will discuss a form of particle tracing that deposits light 
in the textures on triangles. First, we review some basic radiometric relations. 
The radiance L of a Lambertian surface with area A is directly proportional to the 
incident power per unit area: 



(24.1) 


where <l» is the outgoing power from the surface. Note that in this discussion, all 
radiometric quantities are either spectral or RGB depending on the implementa¬ 
tion. If the surface has emitted power <f> e , incident power <l>,, and reflectance R, 
then this equation becomes 


$e + R®i 


If we are given a model with <!> e and R specified for each triangle, we can proceed 
luminaire by luminaire, firing power in the form of particles from each luminaire. 
We associate a texture map with each triangle to store accumulated radiance, with 
all texels initialized to 



If a given triangle has area A and nt texels, and it is hit by a particle carrying 
power (f), then the radiance of that texel is incremented by 



Once a particle hits a surface, we increment the radiance of the texel it hits, prob¬ 
abilistically decide whether to reflect the particle, and if we reflect it we choose a 
direction and adjust its power. 

Note that we want the particle to terminate at some point. For each surface we 
can assign a reflection probability p to each surface interaction. A natural choice 
would be to let p = R as it is with light in nature. The particle would then scatter 
around the environment not losing or gaining any energy until it is absorbed. 
This approach works well when the particles carry a single wavelength (Walter et 
al., 1997). However, when a spectrum or RGB triple is carried by the ray as is 
often implemented (Jensen, 2001), there is no single R and some compromise for 
the value of p should be chosen. The power <f>' for reflected particles should be 
adjusted to account for the possible extinction of the particles: 
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Figure 24.3. The path of a particle that survives with probability 0.5 and is absorbed at the 
last intersection. The RGB power is shown for each path segment. 


Note that p can be set to any positive constant less than one, and that this constant 
can be different for each interaction. When p > R for a given wavelength, the 
particle will gain power at that wavelength, and when p < R it will lose power 
at that wavelength. The case where it gains power will not interfere with conver¬ 
gence because the particle will stop scattering and be terminated at some point as 
long as p < 1. For the remainder of this discussion we set p = 0.5. The path of a 
single particle in such a system is shown in Figure 24.3. 

A key part to this algorithm is that we scatter the light with an appropriate 
distribution for Lambertian surfaces. As discussed in Section 14.4.1, we can find 
a vector with a cosine (Lambertian) distribution by transforming two canonical 
random numbers (£i, £ 2 ) as follows: 

a = (cos (27r^i) v / 6, sin (2 tt£i) V 1 ~ £ 2 ) ■ (24.2) 

Note that this assumes the normal vector is parallel to the z-axis. For a triangle, 
we must establish an orthonormal basis with w parallel to the normal vector. We 
can accomplish this as follows: 


n 



u= Pi~P° 
IIPi-Poll ’ 


v = w x u , 
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where p { are the vertices of the triangle. Then, by definition, our vector in the 
appropriate coordinates is 


(2tt£i + sin ( 27r £i)\/6v +y/l- £ 2 w. (24.3) 


a = cos 


In pseudocode our algorithm for p = 0.5 and one luminaire is: 

for (Each of n particles) do 
RGB phi = 4>/n 

compute uniform random point a on luminaire 
compute random direction b with cosine density 
done = false 
while not done do 

if (ray a + th hits at some point c ) then 
add n t R(f >/ (ttA) to appropriate texel 
if (£1 > 0.5) then 

f = 2R(j) 

a = c 

b = random direction with cosine density 
else 

done = true 

Here £* are canonical random numbers. Once this code has run, the texture maps 
store the radiance of each triangle and can be rendered directly for any viewpoint 
with no additional computation. 


24.2 Path Tracing 


While particle tracing is well suited to precomputation of the radiances of diffuse 
scenes, it is problematic for creating images of scenes with general BRDFs or 
scenes that contain many objects. The most straightforward way to create images 
of such scenes is to use path tracing (Kajiya, 1986). This is a probabilistic method 
that sends rays from the eye and traces them back to the light. Often path tracing 
is used only to compute the indirect lighting. Here we will present it in a way 
that captures all lighting, which can be inefficient. This is sometimes called brute 
force path tracing. In Section 24.3, more efficient techniques for direct lighting 
can be added. 

In path tracing, we start with the full transport equation: 
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Figure 24.4. In path tracing, a ray is followed through a pixel from the eye and scattered 
through the scene until it hits a luminaire. 

We use Monte Carlo integration to approximate the solution to this equation for 
each viewing ray. Recall from Section 14.3, that we can use random samples to 
approximate an integral: 



where the Xi are random points with probability density function p. If we apply 
this directly to the transport equation with IV = 1 we get 


p(kj,k 0 )L/(kt)cos djdcji 

P(ki) 


ks(k 0 ) ~ L e { k Q ) + 


So if we have a way to select random directions k, with a known density p, we 
can get an estimate. The catch is that Lf (k,) is itself an unknown. Fortunately 
we can apply recursion and use a statistical estimate for L/(kj) by sending a ray 
in that direction to find the surface seen in that direction. We end when we hit 
a luminaire and L e is non-zero (Figure 24.4). This method assumes lights have 
zero reflectance, or we would continue to recurse. 

In the case of a Lambertian BRDF (p = R/tt), we can use a cosine density 
function: 


cos 6i 
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A direction with this density can be chosen according to Equation (24.3). This 
allows some cancellation of cosine terms in our estimate: 

L s (k 0 ) * L e (k 0 ) + RL f (k t ). 

In pseudocode such a path tracer for Lambertian surfaces would operate just 
like the ray tracers described in Chapter 4, but the raycolor function would be 
modified: 

RGB raycolor(ray a + ib, inf depth) 
if (ray hits at some point c ) then 
RGB c = L e (—b) 
if (depth < maxdepth) then 
compute random direction d 
return c + R raycolor(c + sd, depth+1) 

else 

return background color 

This will result in a very noisy image unless either large luminaires or very large 
numbers of samples are used. Note the color of the luminaires must be well above 
one (sometimes thousands or tens of thousands) to make the surfaces have final 
colors near one, because only those rays that hit a luminaire by chance will make 
a contribution, and most rays will contribute only a color near zero. To generate 
the random direction d, we use the same technique as we do in particle tracing 
(see Equation (24.2)). 

In the general case we might want to use spectral colors or use a more general 
BRDF. In practice, we should have the material class contain member functions 
to compute a random direction as well as compute the p associated with that 
direction. This way materials can be added transparently to an implementation. 


24.3 Accurate Direct Lighting 

This section presents a more physically-based method of direct lighting than 
Chapter 10. These methods will be useful in making global illumination algo¬ 
rithms more efficient. The key idea is to send shadow rays to the luminaires as 
described in Chapter 4, but to do so with careful bookkeeping based on the trans¬ 
port equation from the previous chapter. The global illumination algorithms can 
be adjusted to make sure they compute the direct component exactly once. For 
example, in particle tracing, particles coming directly from the luminaire would 
not be logged, so the particles would only encode indirect lighting. This makes 
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nice looking shadows much more efficiently than computing direct lighting in the 
context of global illumination. 


24.3.1 Mathematical Framework 



tion (24.4). 


To calculate the direct light from one luminaire (light emitting object) onto a non¬ 
emitting surface, we solve a form of the transport equation from Section 20.2: 


L s (x,k 0 ) 


/' p(k,, k 0 )L e (x', —k i )'u(x,x / ) cos 0$ cos 9' ^ A , /0/1 ^ 

L* II*-XT 1 } 


Recall that L e is the emitted radiance of the source, v is a visibility function that 
is equal to 1 if x “sees” x' and zero otherwise, and the other variables are as 
illustrated in Figure 24.5. 

If we are to sample Equation (24.4) using Monte Carlo integration, we need 
to pick a random point x' on the surface of the luminaire with density function p 
(so x' ~ p). Just plugging into Equation (14.5) with one sample yields 


-Mx, k G ) 


p(ki, k 0 )L e (x / , -kj)u(x, x') cos 0* cos O' 


p( x 0ll x — : 


r /||2 


(24.5) 


If we pick a uniform random point on the luminaire, then p = 1/A, where A is 
the area of the luminaire. This gives 


L s (x,k 0 ) 


p(kj,k 0 )L e (x', 


—kj)v(x, x')Acos 9i cos 9' 

l! x - x 1l 2 


(24.6) 


We can use Equation (24.6) to sample planar (e.g., rectangular) luminaires in a 
straightforward fashion. We simply pick a random point on each luminaire. 

The code for one luminaire is: 


color directLight( x, k Q , n) 

pick random point x' with normal vector n' on light 
d = x' - x 

k i = d/||d|| 

if (ray x + fd has no hits for t < 1 — e) then 

return p(k*, k 0 )L e (x', -kj)(n • d)(-n' • d)/||d|| 4 

else 

return 0 

The above code needs some extra tests such as clamping the cosines to zero if 
they are negative. Note that the term ||d|| 4 comes from the distance squared term 
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Figure 24.6. Various soft shadows on a backlit sphere with a square and an area light 
source. Top: 1 sample. Bottom: 100 samples. Note that the shape of the light source is less 
important than its size in determining shadow appearance. 


and the two cosines, e.g., n ■ d = ||d|| cos 0 because d is not necessarily a unit 
vector. 

Several examples of soft shadows are shown in Figure 24.6. 


24.3.2 Sampling a Spherical Luminaire 

Though a sphere with center c and radius R can be sampled using Equation (24.6), 
this sampling will yield a very noisy image because many samples will be on the 
back of the sphere, and the cos O' term varies so much. Instead, we can use a more 
complex p(x') to reduce noise. 

The first non-uniform density we might try is p(x') oc cos O'. This turns out to 
be just as complicated as sampling with p(x') oc cos O '/|| x ' — x|| 2 , so we instead 
discuss that here. We observe that sampling on the luminaire this way is the 
same as using a constant density function g(kj) = const defined in the space of 
directions subtended by the luminaire as seen from x. We now use a coordinate 
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Figure 24.7. Geometry for direct lighting at point x from a spherical luminaire. 


system defined with x at the origin, and a right-handed orthonormal basis with 
w = (c — x)/||c — x||, and v = (w x n)/||(w x n)|| (see Figure 24.7). We 
also define (a, (f>) to be the azimuthal and polar angles with respect to the uvw 
coordinate system. 

The maximum a that includes the spherical luminaire is given by 


a max = arcsin 



= arccos 




2 


Thus, a uniform density (with respect to solid angle) within the cone of directions 
subtended by the sphere is just the reciprocal of the solid angle 27 t( 1 — cos o max ) 
subtended by the sphere: 


<?( k >) = 



And we get 


cos a 


I" 1 £i+£i\A ( l|x—c||) 

L <p J 


27t£ 2 


This gives us the direction k, . To find the actual point, we need to find the first 
point on the sphere in that direction. The ray in that direction is just (x + fk, ; ), 
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t # 



Figure 24.8. A sphere with L e = 1 touching a sphere of reflectance 1 . Where the two 
spheres touch, the reflective sphere should have L(x') = 1 . Left: 1 sample. Middle: 100 
samples. Right: 100 samples, close-up. 


where k, is given by 



u x v x w x 


cos 4> sin a 

= 

Uy Vy Wy 


sin <f> sin a 


U Z V Z W Z 


cos a 


We must also calculate p(x'), the probability density function with respect to the 
area measure (recall that the density function q is defined in solid angle space). 
Since we know that q is a valid probability density function using the u> measure, 
and we know that dil = dA{x!) cos 9'/ ||x' — x|| 2 , we can relate any probability 
density function g(k,) with its associated probability density function p(x'): 


?(ki) 


p(x') cos 6' 
l|x' — x|| 2 ' 


(24.7) 


So we can solve forp(x'): 


p(x') = 


cos 9' 



A good debugging case for this is shown in Figure 24.8. 


24.3.3 Non-diffuse Luminaries 


There is no reason the luminance of the luminaire cannot vary with both direction 
and position. For example, it can vary with position if the luminaire is a television. 
It can vary with direction for car headlights and other directional sources. Little 
in our analysis need change from the previous sections, except that L e (x') must 
change to L e (x / , k, ). The simplest way to vary the intensity with direction is to 
use a Phong-like pattern with respect to the normal vector n'. To avoid using an 
exponent in the term for the total light output, we can use the form 


L e (x', -ki) = 


(n + l)B(x')_(n—if /> 

- cos K ’U 


27r 
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where /i’(x') is the radiant exitance (power per unit area) at point x', and n is the 
Phong exponent. You get a diffuse light for n = 1. If the light is non-uniform 
across its area, e.g., as a television set is, then E will not be a constant. 


Frequently Asked Questions 

• My pixel values are no longer in some sensible zero-to-one range. What 
should I display? 

You should use one of the tone reproduction techniques described in Chapter 23. 

• What global illumination techniques are used in practice? 

For batch rendering of complex scenes, path tracing with one level of reflection 
is often used. Path tracing is often augmented with a particle tracing prepro¬ 
cess as described in Jensen’s book in the chapter notes. For walkthrough games, 
some form of world-space preprocess is often used, such as the particle tracing 
described in this chapter. For scenes with very complicated specular transport, an 
elegant but involved method. Metropolis Light Transport (Veach & Guibas, 1997) 
may be the best choice. 

• How does the ambient component relate to global illumination? 

For diffuse scenes, the radiance of a surface is proportional to the product of the 
irradiance at the surface and the reflectance of the surface. The ambient com¬ 
ponent is just an approximation to the irradiance scaled by the inverse of n. So 
although it is a crude approximation, there can be some methodology to guessing 
it (M. F. Cohen et al., 1988), and it is probably more accurate than doing nothing, 
i.e., using zero for the ambient term. Because the indirect irradiance can vary 
widely within a scene, using a different constant for each surface can be used for 
better results rather than using a global ambient term. 

• Why do most algorithms compute direct lighting using traditional ray 
tracing? 

Although global illumination algorithms automatically compute direct lighting, 
and it is in fact slightly more complicated to make them compute only indi¬ 
rect lighting, it is usually faster to compute direct lighting separately. There are 
three reasons for this. First, indirect lighting tends to be smooth compared to 
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Figure 24.9. A comparison between a rendering and a photo. Image courtesy Sumant 
Pattanaik and the Cornell Program of Computer Graphics. (See also Plate XXI.) 

direct lighting (see Figure 24.1) so coarser representations can be used, e.g., low- 
resolution texture maps for particle tracing. The second reason is that light sources 
tend to be small, and it is rare to hit them by chance in a “from the eye” method 
such as path tracing, while direct shadow rays are efficient. The third reason is 
that direct lighting allows stratified sampling so it converges rapidly compared to 
unstratihed sampling. The issue of stratification is the reason that shadow rays are 
used in Metropolis Light Transport despite the stability of its default technique for 
dealing with direct lighting as just one type of path to handle. 

• How artificial is it to assume ideal diffuse and specular behavior? 

For environments that have only matte and mirrored surfaces, the Lambertian/ 
specular assumption works well. A comparison between a rendering using that 
assumption and a photograph is shown in Figure 24.9. 

• How many shadow rays are needed per pixel? 

Typically between 16 and 400. Using narrow penumbra, a large ambient term (or 
a large indirect component), and a masking texture (Ferwerda et al., 1997) can 
reduce the number needed. 

• How do I sample something like a filament with a metal reflector where 
much of the light is reflected from the filament? 

Typically the whole light is replaced by a simple source that approximates its 
aggregate behavior. For viewing rays, the complicated source is used. So a car 
headlight would look complex to the viewer, but the lighting code might see sim¬ 
ple disk-shaped lights. 
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• Isn’t something like the sky a luminaire? 

Yes, and you can treat it as one. However, such large light sources may not be 
helped by direct lighting; the brute-force techniques are likely to work better. 


Notes 

Global illumination has its roots in the fields of heat transfer and illumination en¬ 
gineering as documented in Radiosity: A Programmer’s Perspective (Ashdown, 
1994). Other good books related to global illumination include Radiosity and 
Global Illumination (M. F. Cohen & Wallace, 1993), Radiosity and Realistic 
Image Synthesis (Sillion & Puech, 1994), Principles of Digital Image Synthe¬ 
sis (Glassner, 1995), Realistic Image Synthesis Using Photon Mapping (Jensen, 
2001), Advanced Global Illumination (Dutre et al., 2002), and Physically Based 
Rendering (Pharr & Humphreys, 2004). The probabilistic methods discussed 
in this chapter are from Monte Carlo Techniques for Direct Lighting Calcula¬ 
tions (Shirley et ah, 1996). 


Exercises 

1. For a closed environment, where every surface is a diffuse reflector and 
emittor with reflectance R and emitted radiance E, what is the total radi¬ 
ance at each point? Hint: for R = 0.5 and E = 0.25 the answer is 0.5. 
This is an excellent debugging case. 

2. Using the definitions from Chapter 20, verify Equation (24.1). 

3. If we want to render a typically-sized room with textures at centimeter- 
square resolution, approximately how many particles should we send to get 
an average of about 1000 hits per texel? 

4. Develop a method to take random samples with uniform density from a 
disk. 

5. Develop a method to take random samples with uniform density from a 
triangle. 

6. Develop a method to take uniform random samples on a “sky dome” (the 
inside of a hemisphere). 


Reflection Models 


As we discussed in Chapter 20, the reflective properties of a surface can be sum¬ 
marized using the BRDF (Nicodemus et al., 1977; Cook & Torrance, 1982). In 
this chapter, we discuss some of the most visually important aspects of material 
properties and a few fairly simple models that are useful in capturing these prop¬ 
erties. There are many BRDF models in use in graphics, and the models presented 
here are meant to give just an idea of non-diffuse BRDFs. 

25.1 Real-World Materials 

Many real materials have a visible structure at normal viewing distances. For ex¬ 
ample, most carpets have easily visible pile that contributes to appearance. For 
our purposes, such structure is not part of the material property but is, instead, part 
of the geometric model. Structure whose details are invisible at normal viewing 
distances, but which do determine macroscopic material appearance, are part of 
the material property. For example, the fibers in paper have a complex appearance 
under magnification, but they are blurred together into an homogeneous appear¬ 
ance when viewed at arm's length. This distinction between microstructure that 
is folded into BRDF is somewhat arbitrary and depends on what one defines as 
“normal” viewing distance and visual acuity, but the distinction has proven quite 
useful in practice. 

In this section we define some categories of materials. Later in the chapter, 
we present reflection models that target each type of material. In the notes at the 
end of the chapter some models that account for more exotic materials are also 
discussed. 
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Figure 25.1. The amount 
of light reflected and trans¬ 
mitted by glass varies with 
the angle. 



Figure 25.2. Light is re¬ 
peatedly reflected and re¬ 
fracted by glass, with the 
fractions of energy shown. 


25.1.1 Smooth Dielectrics and Metals 

Dielectrics are clear materials that refract light; their basic properties were sum¬ 
marized in Chapter 4. Metals reflect and refract light much like dielectrics, but 
they absorb light very, very quickly. Thus, only very thin metal sheets are trans¬ 
parent at all, e.g., the thin gold plating on some glass objects. For a smooth 
material, there are only two important properties: 

1. How much light is reflected at each incident angle and wavelength. 

2. What fraction of light is absorbed as it travels through the material for a 
given distance and wavelength. 

The amount of light transmitted is whatever is not reflected (a result of energy 
conservation). For a metal, in practice, we can assume all the light is immediately 
absorbed. For a dielectric, the fraction is determined by the constant used in 
Beer’s Law as discussed in Chapter 4. 

The amount of light reflected is determined by the Fresnel equations as dis¬ 
cussed in Chapter 4. These equations are straightforward, but cumbersome. The 
main effect of the Fresnel Equations is to increase the reflectance as the incident 
angle increases, particularly near grazing angles. This effect works for transmitted 
light as well. These ideas are shown diagrammatic ally in Figure 25.1. Note that 
the light is repeatedly reflected and refracted as shown in Figure 25.2. Usually 
only one or two of the reflected images is easily visible. 


25.1.2 Rough Surfaces 

If a metal or dielectric is roughened to a small degree, but not so small that diffrac¬ 
tion occurs, then we can think of it as a surface with microfacets (Cook & Tor¬ 
rance, 1982). Such surfaces behave specularly at a closer distance, but viewed 
at a further distance seem to spread the light out in a distribution. For a metal, 
an example of this rough surface might be brushed steel, or the “cloudy” side of 
most aluminum foil. 

For dielectrics, such as a sheet of glass, scratches or other irregular surface 
features make the glass blur the reflected and transmitted images that we can 
normally see clearly. If the surface is heavily scratched, we call it translucent 
rather than transparent. This is a somewhat arbitrary distinction, but it is usually 
clear whether we would consider a glass translucent or transparent. 
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25.1.3 Diffuse Materials 

A material is diffuse if it is matte, i.e., not shiny. Many surfaces we see are diffuse, 
such as most stones, paper, and unfinished wood. To a first approximation,diffuse 
surfaces can be approximated with a Lambertian (constant) BRDF. Real diffuse 
materials usually become somewhat specular for grazing angles. This is a subtle 
effect, but can be important for realism. 


25.1.4 Translucent Materials 


Many thin objects, such as leaves and paper, both transmit and reflect light dif¬ 
fusely. For all practical purposes no clear image is transmitted by these objects. 
These surfaces can add a hue shift to the transmitted light. For example, red paper 
is red because it filters out non-red light for light that penetrates a short distance 
into the paper, and then scatters back out. The paper also transmits light with a 
red hue because the same mechanisms apply, but the transmitted light makes it all 
the way through the paper. One implication of this property is that the transmitted 
coefficient should be the same in both directions. 

25.1.5 Layered Materials 

Many surfaces are composed of “layers” or are dielectrics with embedded parti¬ 
cles that give the surface a diffuse property (Phong, 1975). The surface of such 
materials reflects specularly as shown in Figure 25.3, and thus obeys the Fresnel 
equations. The light that is transmitted is either absorbed or scattered back up 
to the dielectric surface where it may or may not be transmitted. That light that 
is transmitted, scattered, and then retransmitted in the opposite direction forms a 
diffuse “reflection” component. 

Note that the diffuse component also is attenuated with the degree of the angle, 
because the Fresnel equations cause reflection back into the surface as the angle 
increases as shown in Figure 25.4. Thus instead of a constant diffuse BRDF, one 
that vanishes near the grazing angle is more appropriate. 

25.2 Implementing Reflection Models 

A BRDF model, as described in Section 20.1.6, will produce a rendering which 
is more physically based than the rendering we get from point light sources and 
Phong-like models. Unfortunately, real BRDFs are typically quite complicated 
and cannot be deduced from first principles. Instead, they must either be measured 



Figure 25.3. Light hit¬ 
ting a layered surface can 
be reflected specularly, or it 
can be transmitted and then 
scatter diffusely off the sub¬ 
strate. 



Figure 25.4. The light 
scattered by the substrate 
is less and less likely to 
make it out of the surface as 
the angle relative to the sur¬ 
face normal increases. 
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and directly approximated from raw data, or they must be crudely approximated 
in an empirical fashion. The latter empirical strategy is what is usually done, and 
the development of such approximate models is still an area of research. This 
section discusses several desirable properties of such empirical models. 

First, physical constraints imply two properties of a BRDF model. The first 
constraint is energy conservation: 

for all kj,i?(kj) = / p(kj, k Q ) cos 0 o da 0 < 1. 

J all k D 

If you send a beam of light at a surface from any direction k,, then the total 
amount of light reflected over all directions will be at most the incident amount. 
The second physical property we expect all BRDFs to have is reciprocity: 


for all k, ; , k Q , p(kj, k Q ) = p( k Q , k, ; ). 


Second, we want a clear separation between diffuse and specular components. 
The reason for this is that, although there is a mathematically-clean delta function 
formulation for ideal specular components, delta functions must be implemented 
as special cases in practice. Such special cases are only practical if the BRDF 
model clearly indicates what is specular and what is diffuse. 

Third, we would like intuitive parameters. For example, one reason the Phong 
model has enjoyed such longevity is that its diffuse constant and exponent are 
both clearly related to the intuitive properties of the surface, namely surface color 
and highlight size. 

Finally, we would like the BRDF function to be amenable to Monte Carlo 
sampling. Recall from Chapter 14 that an integral can be sampled by N random 
points Xi ~ p where p is defined with the same measure as the integral: 



Recall from Section 20.2 that the surface radiance in direction k Q is given by a 
transport equation: 

L s ( k Q ) = / p(k l ,’k 0 )Lf(k l )cosd i da i . 

J all ki 

If we sample directions with pdf p(k;) as discussed in Chapter 24, then we can 
approximate the surface radiance with samples: 
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This approximation will converge for any p that is non-zero where the integrand 
is non-zero. However, it will only converge well if the integrand is not very large 
relative to p. Ideally, p(k) should be approximately shaped like the integrand 
p(kj, k 0 )L/(kj) cos 9j. In practice, Lf is complicated, and the best we can ac¬ 
complish is to havep(k) shaped somewhat like p(k, k 0 )L/(k) cos 9. 

For example, if the BRDF is Lambertian, then it is constant and the “ideal” 
p(k) is proportional to cos 0. Because the integral of p must be one, we can 
deduce the leading constant: 

j C cos 9da = 1. 

J all k with 6 < it / 2 

This implies that C = 1/n, so we have 


p(k) = — cos 9. 


IT 


An acceptably efficient implementation will result as long as p doesn’t get too 
small when the integrand is non-zero. Thus, the constant pdf will also suffice: 


This emphasizes that many pdfs may be acceptable for a given BRDF model. 


25.3 Specular Reflection Models 


For a metal, we typically specify the reflectance at normal incidence Rq(X). The 
reflectance should vary according to the Fresnel equations, and a good approxi¬ 
mation is given by (Schlick, 1994a) 


R(9 , A) = Rq(X) + (1 - R 0 (A)) (1 - cos 9) 5 . 


This approximation allows us to just set the normal reflectance of the metal either 
from data or by eye. 

For a dielectric, the same formula works for reflectance. However, we can set 
i? 0 (A) in terms of the refractive index n(A): 



Typically, n does not vary with wavelength, but for applications where dispersion 
is important, n can vary. The refractive indices that are often useful include water 
(n = 1.33), glass (n = 1.4 to n = 1.7), and diamond (n = 2.4). 
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Figure 25.5. Renderings of polished tiles using coupled model. These images were pro¬ 
duced using a Monte Carlo path tracer. The sampling distribution for the diffuse term is 

cos 6/tt. 


25.4 Smooth Layered Model 

Reflection in matte/specular materials, such as plastics or polished woods, is gov¬ 
erned by Fresnel equations at the surface and by scattering within the subsurface. 
An example of this reflection can be seen in the tiles in the renderings in Fig¬ 
ure 25.5. Note that the blurring in the specular reflection is mostly vertical due 
to the compression of apparent bump spacing in the view direction. This effect 
causes the vertically-streaked reflections seen on lakes on windy days; it can either 
be modeled using explicit micro-geometry and a simple smooth-surface reflection 
model or by a more general model that accounts for this asymmetry. 

We could use the traditional Lambertian-specular model for the tiles, which 
linearly mixes specular and Lambertian terms. In standard radiometric terms, this 
can be expressed as 

p(0,<t>, o', A) = + R sP s(0, <f>, O', </>'), 

7T 

where Rd( A) is the hemispherical reflectance of the matte term, R s is the specu¬ 
lar reflectance, and p s is the normalized specular BRDF (a weighted Dirac delta 
function on the sphere). This equation is a simplified version of the BRDF where 
R s is independent of wavelength. The independence of wavelength causes a high¬ 
light that is the color of the luminaire, so a polished rather than a metal appearance 
will be achieved. Ward (G. J. Ward, 1992) suggests to set Rd{ A) + f? s < 1 in 
order to conserve energy. However, such models with constant R s fail to show 
the increase in specularity for steep viewing angles. This is the key point: in the 
real world the relative proportions of matte and specular appearance change with 
the viewing angle. 
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One way to simulate the change in the matte appearance is to explicitly dampen 
Rd(X) as R s increases (Shirley, 1991): 

p(0,<t>, O', A) = Rf(0)p s (e, <t >, 9'^') + R dW^-Rf(0)) ; 

7T 

where Rf(9) is the Fresnel reflectance for a polish-air interface. The problem with 
this equation is that it is not reciprocal, as can been seen by exchanging 9 and 9'; 
this changes the value of the matte damping factor because of the multiplication 
by (1 — Rf(9)). The specular term, a scaled Dirac delta function, is reciprocal, 
but this does not make up for the non-reciprocity of the matte term. Although this 
BRDF works well, its lack of reciprocity can cause some rendering methods to 
have ill-defined solutions. 

We now present a model that produces the matte/specular tradeoff while re¬ 
maining reciprocal and energy conserving. Because the key feature of the new 
model is that it couples the matte and specular scaling coefficients, it is called a 
coupled model (Shirley et al., 1997). 

Surfaces which have a glossy appearance are often a clear dielectric, such 
as polyurethane or oil, with some subsurface structure. The specular (mirror¬ 
like) component of the reflection is caused by the smooth dielectric surface and 
is independent of the structure below this surface. The magnitude of this specular 
term is governed by the Fresnel equations. 

The light that is not reflected specularly at the surface is transmitted through 
the surface. There, either it is absorbed by the subsurface, or it is reflected from 
a pigment or a subsurface and transmitted back through the surface of the pol¬ 
ish. This transmitted light forms the matte component of reflection. Since the 
matte component can only consist of the light that is transmitted, it will naturally 
decrease in total magnitude for increasing angle. 

To avoid choosing between physically plausible models and models with good 
qualitative behavior over a range of incident angles, note that the Fresnel equa¬ 
tions that account for the specular term, Rf(9), are derived directly from the 
physics of the dielectric-air interface. Therefore, the problem must lie in the 
matte term. We could use a full-blown simulation of subsurface scattering as 
implemented, but this technique is both costly and requires detailed knowledge 
of subsurface structure, which is usually neither known nor easily measurable. 
Instead, we can modify the matte term to be a simple approximation that captures 
the important qualitative angular behavior shown in Figure 25.4. 

Let us assume that the matte term is not Lambertian, but instead is some other 
function that depends only on 9, 9' and A: p rn (9 1 9' 1 X). We discard behavior 
that depends on tp or <f>’ in the interest of simplicity. We try to keep the formu¬ 
las reasonably simple because the physics of the matte term is complicated and 
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sometimes requires unknown parameters. We expect the matte term to be close to 
constant, and roughly rotationally symmetric (He et al., 1992). 

An obvious candidate for the matte component p m {9, 9', A) that will be re¬ 
ciprocal is the separable form kR m (X) f (6) f (6') for some constant k and matte 
reflectance parameter R m ( A). We could merge k and R m ( A) into a single term, 
but we choose to keep them separated because this makes it more intuitive to set 
Rm (A) —which must be between 0 and 1 for all wavelengths. Separable BRDFs 
have been shown to have several computational advantages, thus we use the sep¬ 
arable model: 


p(9,<f>, 9', </>', A) = R f (9)p s (0, </>, e\ 4>') + kR m (X)f(9)f(9'). 


We know that the matte component can only contain energy not reflected in the 
surface (specular) component. This means that for R m ( A) = 1, the incident 
and reflected energy are the same, which suggests the following constraint on the 
BRDF for each incident 9 and A: 



(25.1) 


We can see that f{9) must be proportional to (1 — Rf{9)). If we assume that 
matte components that absorb some energy have the same directional pattern as 
this ideal, we get a BRDF of the form 


p(9, 0 , 9', </>', A) = R f (9)p s (9, 0 , O',#) + kR m ( A)[1 - R f (9)} [1 - R f (6% 


We could now insert the full form of the Fresnel equations to get Rf(9), and then 
use energy conservation to solve for constraints on k. Instead, we will use the 
approximation discussed in Section 25.1.1 We find that 


f(9) oc (1 - (1 -cos6») 5 ). 


Applying Equation (25.1) yields 


20tt( 1 - f? 0 )' 


(25.2) 


The full coupled BRDF is then 


p{9,<j>,9\<V, A) = 

[Ro + (1 - cos #) 5 (1 - Ro)] Ps(9,4 >, 9 '+ 
kR m ( A) [l — (1 — cos#) 5 ] [1- (l-cos6»') 5 ] • 


(25.3) 
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The results of running the coupled model is shown in Figure 25.5. Note that 
for the high viewpoint, the specular reflection is almost invisible, but it is clearly 
visible in the low-angle photograph image, where the matte behavior is less obvi¬ 
ous. 

For reasonable values of refractive indices, Rq is limited to approximately the 
range 0.03 to 0.06 (the value Rq = 0.05 was used for Figure 25.5). The value of 
R s in a traditional Phong model is harder to choose, because it typically must be 
tuned for viewpoint in static images and tuned for a particular camera sequence 
for animations. Thus, the coupled model is easier to use in a “hands-off” mode. 


25.5 Rough Layered Model 

The previous model is fine if the surface is smooth. However, if the surface is 
not ideal, some spread is needed in the specular component. An extension of the 
coupled model to this case is presented here (Ashikhmin & Shirley, 2000). At 
a given point on a surface, the BRDF is a function of two directions, one in the 
direction towards the light and one in the direction towards the viewer. We would 
like to have a BRDF model that works for “common” surfaces, such as metal and 
plastic, and has the following characteristics: 

1. plausible. As defined by Lewis (R. R. Lewis, 1994), this refers to the 
BRDF obeying energy conservation and reciprocity. 

2. anisotropy. The material should model simple anisotropy, such as seen on 
brushed metals. 

3. intuitive parameters. For material, such as plastics, there should be pa¬ 
rameters Rd for the substrate and R s for the normal specular reflectance as 
well as two roughness parameters n u and n v . 

4. Fresnel behavior. Specularity should increase as the incident angle de¬ 
creases. 

5. non-Lambertian diffuse term. The material should allow for a diffuse 
term, but the component should be non-Lambertian to assure energy con¬ 
servation in the presence of Fresnel behavior. 

6. Monte Carlo friendliness. There should be some reasonable probability 
density function that allows straightforward Monte Carlo sample generation 
for the BRDF. 
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Figure 25.6. Geometry of reflection. Note that ki, k 2 , and h share a plane, which usually 
does not include n. 


A BRDF with these properties is a Fresnel-weighted Phong-style cosine lobe 
model that is anisotropic. 

We again decompose the BRDF into a specular component and a diffuse com¬ 
ponent (Figure 25.6). Accordingly, we write our BRDF as the classical sum of 
two parts: 

p(ki,k 2 ) = p g (ki,k 2 ) + Pd(ki,k 2 ), (25.4) 

where the first term accounts for the specular reflection (this will be presented in 
the next section). While it is possible to use the Lambertian BRDF for the diffuse 
term p^ki, k 2 ) in our model, we will discuss a better solution in Section 25.5.2 
and how to implement the model in Section 25.5.3. Readers who just want to 
implement the model should skip to that section. 


25.5.1 Anisotropic Specular BRDF 


To model the specular behavior, we use a Phong-style specular lobe but make this 
lobe anisotropic and incorporate Fresnel behavior while attempting to preserve 
the simplicity of the initial mode. This BRDF is 


p(ki,k 2 ) = 


■sj{n u + l){n v + l) (n ■ h) ra “ cos *+">' sin <t> 
87 r (h • k, ; )max(cos cos 0 o )) 


F( ki-h). (25.5) 


Again we use Schlick’s approximation to the Fresnel equation: 


F(k, ■ h) = R s + (1 - R s )( 1 - (k, • h)) 5 , (25.6) 

where R s is the material's reflectance for the normal incidence. Because k, ■ h = 
k Q • h, this form is reciprocal. We have an empirical model whose terms are 
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Figure 25.7. Metallic spheres for exponents 10, 100, 1000, 10000 increasing both left-to- 
right and top-to-bottom. 


chosen to enforce energy conservation and reciprocity. A full rationalization for 
the terms is given in the paper by Ashikhmin, listed in the chapter notes. 

The specular BRDF of Equation (25.5) is useful for representing metallic sur¬ 
faces where the diffuse component of reflection is very small. Figure 25.7 shows 
a set of metal spheres on a texture-mapped Lambertian plane. As the values of 
parameters n u and n v change, the appearance of the spheres shift from rough 
metal to almost perfect mirror, and from highly anisotropic to the more familiar 
Phong-like behavior. 

25.5.2 Diffuse Term for the Anisotropic Phong Model 

It is possible to use a Lambertian BRDL together with the anisotropic specular 
term; this is done for most models, but it does not necessarily conserve energy. A 
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Figure 25.8. Three views for n u = n v = 400 and a diffuse substrate. Note the change in 
intensity of the specular reflection. 


better approach is a simple angle-dependent form of the diffuse component which 
accounts for the fact that the amount of energy available for diffuse scattering 
varies due to the dependence of the specular term's total reflectance on the inci¬ 
dent angle. In particular, diffuse color of a surface disappears near the grazing 
angle, because the total specular reflectance is close to one. This well-known ef¬ 
fect cannot be reproduced with a Lambertian diffuse term and is therefore missed 
by most reflection models. 

Following a similar approach to the coupled model, we can find a form of the 
diffuse term that is compatible with the anisotropic Phong lobe: 


Pd( ki,k 2 ) = 


28 Rd 
23tt 


d-iufi-(.-=*)■) 


(25.7) 

Here R,i is the diffuse reflectance for normal incidence, and R s is the Phong lobe 
coefficient. An example using this model is shown in Figure 25.8. 


25.5.3 Implementing the Model 

Recall that the BRDF is a combination of diffuse and specular components: 

p(ki,k 2 ) = p s (ki,k 2 )+ pd(ki,k 2 ). (25.8) 

The diffuse component is given in Equation (25.7); the specular component is 
given in Equation (25.5). It is not necessary to call trigonometric functions to 
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compute the exponent, so the specular BRDF can be written: 


jo(ki,k 2 ) = 


\/(n„ + Din,, + 11 (n„(h.u) 2 + n „(hv) 2 )/(l —(hn) 2 ) 

* ' tv -(hkjjmiilcos 0 o ) ^(kj • h). 


87T 


(25.9) 

In a Monte Carlo setting, we are interested in the following problem: given ki, 
generate samples of k 2 with a distribution whose shape is similar to the cosine- 
weighted BRDF. Note that greatly undersampling a large value of the integrand is 
a serious error, while greatly oversampling a small value is acceptable in practice. 
The reader can verify that the densities suggested below have this property. 

A suitable way to construct a pdf for sampling is to consider the distribution 
of half vectors that would give rise to our BRDF. Such a function is 


Ph ( h) = VK + 1)K _+1) ^ nh )n M cos 2 0+n„sin 2 0 (2 5. 10) 

Z7T 

where the constants are chosen to ensure it is a valid pdf. 

We can just use the probability density functionpfj(h) of Equation (25.10) to 
generate a random h. However, to evaluate the rendering equation, we need both 
a reflected vector k Q and a probability density function p(k c ). It is important to 
note that if you generate h according to ph (h) and then transform to the resulting 
k D : 

k 0 = -k i +2(k l h)h, (25.11) 

the density of the resulting k Q is not ph{ k D ). This is because of the difference in 
measures in h and k Q . So the actual density p(k Q ) is 


P(k 0 ) 


Ph(h) 

4(kjh)' 


(25.12) 


Note that in an implementation where the BRDF is known to be this model, the 
estimate of the rendering equation is quite simple as many terms cancel out. 

It is possible to generate an h vector whose corresponding vector k c will point 
inside the surface, i.e., cos 0 o < 0. The weight of such a sample should be set 
to zero. This situation corresponds to the specular lobe going below the horizon 
and is the main source of energy loss in the model. Clearly, this problem becomes 
progressively less severe as n u , n v become larger. 

The only thing left now is to describe how to generate h vectors with the pdf 
of Equation (25.10). We will start by generating h with its spherical angles in 
the range ( 6 1 (j>) £ [0, ?] x [0, 5]. Note that this is only the first quadrant of the 
hemisphere. Given two random numbers (£i,£ 2 ) uniformly distributed in [0,1], 
we can choose 



<f> = arctan 


(25.13) 
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and then use this value of <fi to obtain 0 according to 

cos e = (1 - 6) 1/(n “ cos2 0+ ”"' s “ 2 0+1) - (25.14) 

To sample the entire hemisphere, we use the standard manipulation where is 
mapped to one of four possible functions depending on whether it is in [0,0.25), 
[0.25, 0.5), [0.5, 0.75),or [0.75,1.0). For example for £ [0.25, 0.5),find 4>(1 — 
4(0.5 — £i)) via Equation (25.13), and then “flip” it about the </> = tt/2 axis. This 
ensures full coverage and stratification. 

For the diffuse term, use a simpler approach and generate samples according 
to a cosine distribution. This is sufficiently close to the complete diffuse BRDF 
to substantially reduce variance of the Monte Carlo estimation. 


Frequently Asked Questions 

• My images look too smooth, even with a complex BRDF. What am I do¬ 
ing wrong? 

BRDFs only capture subpixel detail that is too small to be resolved by the eye. 
Most real surfaces also have some small variations, such as the wrinkles in skin, 
that can be seen. If you want true realism, some sort of texture or displacement 
map is needed. 

• How do I integrate the BRDF with texture mapping? 

Texture mapping can be used to control any parameter on a surface. So any kinds 
of colors or control parameters used by a BRDF should be programmable. 

• I have very pretty code except for my material class. What am I doing 
wrong? 

You are probably doing nothing wrong. Material classes tend to be the ugly thing 
in everybody’s programs. If you find a nice way to deal with it, please let me 
know! My own code uses a shader architecture (Hanrahan & Lawson, 1990) 
which makes the material include much of the rendering algorithm. 


Notes 

There are many BRDF models described in the literature, and only a few of them 
have been described here. Others include (Cook & Torrance, 1982; He et al., 
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1992; G. J. Ward, 1992; Oren & Nayar, 1994; Schlick, 1994a; Lafortune et al., 
1997; Stam, 1999; Ashikhmin et al., 2000; Ershov et al., 2001; Matusik et al., 
2003; Lawrence et al., 2004; Stark et al., 2005). The desired characteristics of 
BRDF models is discussed in Making Shaders More Physically Plausible (R. R. 
Lewis, 1994). 


Exercises 

1. Suppose that instead of the Lambertian BRDF we used a BRDF of the form 
C cos" 0,. What must C be to conserve energy? 

2. The BRDF in Exercise 1 is not reciprocal. Can you modify it to be recipro¬ 
cal? 

3. Something like a highway sign is a retroreflector. This means that the 
BRDF is large when k, and k Q are near each other. Make a model inspired 
by the Phong model that captures retroreflection behavior while being re¬ 
ciprocal and conserving energy. 




Computer Graphics in Games 


Of all the applications of computer graphics, computer and video games attract 
perhaps the most attention. The graphics methods selected for a given game have 
a profound effect, not only on the game engine code, but also on the art asset 
creation, and even sometimes on the gameplay, or core game mechanics. 

Although game graphics rely on the material in all of the preceding chapters, 
two chapters are particularly germane. Games need to make highly efficient use of 
graphics hardware, so an understanding of the material in Chapter 18 is important. 
Of course, games are interactive applications, and, as such, many of the principles 
detailed in Chapter 19 apply. 

In this chapter, I will detail the specific considerations that apply to graph¬ 
ics in game development, from the platforms on which games run to the game 
production process. 


26.1 Platforms 

Here, I use the term platform to refer to a specific combination of hardware, op¬ 
erating system, and API (application programming interface) for which a game 
is designed. Games run on a large variety of platforms, ranging from virtual 
machines used for browser-based games to dedicated game consoles using spe¬ 
cialized hardware and APIs. 

In the past, it was common for games to be designed for a single platform. 
The increasing cost of game development has made this rare; multiplatform game 
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development is now the norm. The incremental increase in development cost to 
support multiple platforms is more than repaid by a potential doubling or tripling 
of the customer base. 

Some platforms are quite loosely defined. For example, when developing a 
game for the Windows PC platform, the developer must account for a very large 
variety of possible hardware configurations. Games are even expected to run (and 
run well) on PC configurations that did not exist when the game was developed! 
This is only possible due to the abstractions afforded by the APIs defining the 
Windows platform. 

One way in which developers account for wide variance in graphics perfor¬ 
mance is by scaling —adjusting graphics quality in response to system capabil¬ 
ities. This can ensure reasonable performance on low-end systems, while still 
achieving competitive visuals on high-performance systems. This adjustment is 
sometimes done automatically by profiling the system performance, but more of¬ 
ten this control is left in the hands of the user, who can best judge his personal 
preferences for quality versus speed. Display resolution is easiest to adjust, fol¬ 
lowed by antialiasing quality. It is also fairly common to offer several quality 
levels for visual effects such as shadows and motion blur, including the option of 
turning the effect off entirely. 

Differences in graphics performance can be so large that some machines may 
not run the game at a playable frame rate, even with the lowest quality settings; 
for this reason PC game developers publish minimum and recommended machine 
specifications for each game. 

As platforms, game consoles are strictly defined. When developing a game 
for, e.g., Nintendo’s Wii console, the developer knows exactly what hardware the 
game will run on. If the platform’s hardware implementation is changed (often 
done to reduce manufacturing costs), the console manufacturer must ensure that 
the new implementation behaves exactly like the previous one, including timing 
and performance. This is not to say that the console developer’s task is easy; con¬ 
sole APIs tend to be much less abstract and closer to the underlying hardware. 
This gives console development its own set of difficulties. In some sense, mul¬ 
tiplatform development (which commonly includes at least two different console 
platforms and often Windows as well) is the hardest of all, since the multiplatform 
game developer has neither the assurance of a fixed platform or the convenience 
of a single high-level API. 

Browser-based virtual machines such as Adobe Flash are an interesting class 
of game platforms. Although such virtual machines run on a wide class of hard¬ 
ware from personal computers to mobile phones, the high degree of abstraction 
provided by the virtual machine results in a stable and unified development plat- 
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form. The relative ease of development for these platforms and the huge pool 
of potential customers makes them increasingly attractive to game developers. 
However, these platforms are defined by the lowest common denominator of the 
supported hardware, and virtual machines have lower performance than native 
code on any given platform. For these reasons, such platforms are best suited to 
games with modest graphics requirements. 

Platforms can also be characterized by their openness to development, which 
is a business or legal distinction rather than a technical one. For example, Win¬ 
dows is open in the sense that development tools are widely available, and there 
are no gatekeepers controlling access to the marketplace of Windows games. Ap¬ 
ple’s iPhone is a somewhat more restricted platform in that all applications need 
to pass a certification process and certain classes of applications are banned out¬ 
right. Consoles are the most restrictive game platforms, where access to the de¬ 
velopment tools is tightly controlled. This is opening up somewhat with the in¬ 
troduction of online console game marketplaces, which tend to be more open. A 
particularly interesting example is Microsoft’s Xbox LIVE Community Games 
service, where the development tools are freely available and the “gatekeeping” is 
performed primarily by peer review. Games distributed through this service must 
use a virtual machine platform provided by Microsoft for security reasons. 

The game platform determines many elements of the game experience. For 
example, PC gamers use keyboard and mouse, while console gamers use special¬ 
ized game controllers. Many console games support multiple players on the same 
console, either sharing a screen or providing a window for each player. Due to the 
difficulty of sharing keyboard and mouse, this type of play is not found on PC. A 
handheld game system will have a different control scheme than a touch-screen 
phone, etc. 

Although game platforms vary widely, some common trends can be discerned. 
Most platforms have multiple processing cores, divided between general-purpose 
(CPU) and graphics-specific (GPU). Performance gains over time are due mostly 
to increases in core count; gains in individual core performance are modest. As 
GPU cores grow in generality, the lines between GPU and CPU cores are increas¬ 
ingly blurred. Storage capacity tends to increase at a slower rate than processing 
power, and communication bandwidth (between cores as well as between each 
core and storage) grows at a slower pace still. 

26.2 Limited Resources 

One of the primary challenges of game graphics is the need to manage multiple 
pools of limited resources. Each platform imposes its own constraints on hard- 
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ware resources such as processing time, storage, and memory bandwidth. At a 
higher level, development resources also need to be managed; there is a fixed-size 
team of programmers, artists and game designers with limited time to complete 
the game, hopefully without working too much overtime! This needs to be taken 
into account when deciding which graphics techniques to adopt. 


26.2.1 Processing Time 

Early game developers only had to worry about budgeting a single processor. 
Current game platforms contain multiple CPU and GPU cores. These processors 
need to be carefully synchronized to avoid deadlocks or excessive 
stalls. 

Since the time consumed by a single rendering command is highly variable, 
graphics processors are decoupled from the rest of the system via a command 
buffer. This buffer acts as a queue; commands are deposited on one end and 
the GPU reads rendering commands from the other. Increasing the size of this 
buffer decreases the chances of GPU starvation. It is fairly common for games to 
buffer an entire frame’s worth of rendering commands before sending them to the 
GPU; this guarantees that GPU starvation does not occur. However, this approach 
requires reserving enough storage space for two full frame’s worth of commands 
(the GPU works on one, while the CPU deposits commands in the other). It 
also increases the latency between the user’s input and the display, which can be 
problematic for fast-paced games. 

Processing budgets are determined by th t frame rate , which is the frequency 
at which the frame buffer is refreshed with new renderings of the scene. On fixed 
platforms (such as consoles), the frame rate experienced by the user is essentially 
the same one seen by the game developer, so fairly strict frame-rate limits can be 
imposed. Most games target a frame rate of 30 frames per second (fps); in games 
where response latency is especially important, the target is often 60 fps. On 
highly variable platforms (such as PCs), the frame-rate budgets are (by necessity) 
defined more loosely. 

The required frame rate gives the graphics programmer a fixed budget per 
frame to work with. In the case of a 30 fps target, the CPU cores have 33 millisec¬ 
onds to gather inputs, process the game logic, perform any physical simulations, 
traverse the scene description, and send the rendering commands to the graphics 
hardware. In parallel, other tasks such as audio and network processing must be 
handled, with their own required response times. While this is happening, the 
GPU is typically executing the graphics commands submitted during the previous 
frame. 
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In most cases, CPU cores are a homogeneous resource; all cores are the same, 
and any of them are equally well suited to a given workload (there are some 
exceptions, such as the Cell processor used in Sony’s PLAYSTATION 3 console). 

In contrast, GPUs contain a heterogeneous mix of resources, each special¬ 
ized to a certain set of tasks. Some of these resources consist of fixed-function 
hardware (for triangle rasterization, alpha blending, and texture sampling), and 
some are programmable cores. On older GPUs, programmable cores were further 
differentiated into vertex and pixel processing cores; newer GPU designs have 
unified shader cores which can execute any of the programmable shader types. 

Such heterogeneous resources are budgeted separately. Typically, at any point, 
only one resource type will be the bottleneck, and the others will have excess ca¬ 
pacity. On the one hand, this is good, since this capacity can be leveraged to 
improve visual quality without decreasing performance. On the other hand, it 
makes it harder to improve performance, since decreasing usage of any of the 
non-bottleneck resources will have no effect. Even decreasing usage of the bot¬ 
tleneck resource may only improve performance slightly, depending on the degree 
of utilization of the “next bottleneck.” 


26.2.2 Storage 

Game platforms, like any modern computing system, possess multi-stage stor¬ 
age hierarchies, with smaller, faster memory types at the top and larger, slower 
storage at the bottom. This arrangement is borne of engineering necessity, al¬ 
though it does complicate life for the developer. Most platforms include optical 
disc storage, which is extremely slow and is used mostly for delivery. On plat¬ 
forms such as Windows, a lengthy installation process is performed once to move 
all data from the optical disc onto the hard drive, which is significantly faster. 
The optical disc is never used again (except as an anti-piracy measure). On con¬ 
sole platforms, this is less common, although it does sometimes happen when a 
hard drive is guaranteed to be present, as on Sony's PLAYSTATION 3 console. 
More often, the hard drive (if present) is only used as a cache for the optical 
disc. 

The next step up the memory hierarchy is RAM, which on many platforms is 
divided into general system RAM and VRAM (video RAM) which benefits from 
a high-speed interface to the graphics hardware. A game level may be too large to 
fit in RAM, in which case the game developer needs to manage moving the data 
in and out of RAM as needed. On platforms such as Windows, virtual memory 
is often used for this. On console platforms, custom data streaming and caching 
systems are typically employed. 
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Finally, both the CPU and GPU boast various kinds of on-chip memory and 
caches. These are extremely small and fast and are usually managed by the graph¬ 
ics API. 

Graphics resources take up a lot of memory, so they are a primary focus of 
storage budgets in game development. Textures are usually the greatest memory 
consumers, followed by geometry (vertex data), and finally other types of graphics 
data such as animations. Not all memory can be used for graphics—audio also 
takes up a fair bit, and game logic may use sizeable data structures. As in the case 
of processing time, budgeting tends to be somewhat looser on Windows, where 
the exact amount of memory present on the user’s system is unknown and virtual 
memory covers a multitude of sins. In contrast, memory budgeting on console 
platforms is quite strict—often the lead programmer keeps track of memory on a 
spreadsheet and a programmer requiring more memory for their system needs to 
beg, borrow, or steal it from someone else. 

The various levels of the memory hierarchy differ not only in size, but also in 
access speed. This has two separate dimensions: latency and bandwidth. 

Latency is the time that elapses between a storage access request and its final 
fulfillment. This varies from a few clock cycles (for on-chip cache) to millions of 
clock cycles (for data residing on optical disc). Latency is usually an issue for read 
access (although write latency can also be an issue if the result needs to be read 
back from memory soon after). In some cases, the read request is blocking, which 
means that the processor core that submitted the read can do nothing else until the 
request is fulfilled. In other cases, the read is non-blocking-, the processing core 
can submit the read request, do other types of processing, and then use the results 
of the read after it has arrived. Texture accesses by the GPU are an example of 
non-blocking reads; an important aspect of GPU design is to find ways to “hide” 
texture read latency by performing unrelated computations while the texture read 
is being fulfilled. 

For this latency hiding to work, there must be a sufficient amount of computa¬ 
tion relative to texture accesses. This is an important consideration for the shader 
writer; the optimal mix of computation vs. texture access keeps changing (in fa¬ 
vor of more computation) as memory fails to keep up with increases in processing 
power. 

Bandwidth refers to the maximum rate of transfer to and from storage. It is 
typically measured in gigabytes per second. 

26.2.3 Development Resources 

Besides hardware resources, such as processing power and storage space, the 
game graphics programmer also has to contend with a different kind of limited 
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resource—the time of his team-mates! When selecting graphics techniques, the 
engineering resources needed to implement each technique must be taken into ac¬ 
count, as well as any tools necessary to compute the input data (in many cases, 
tools can take significantly more time than implementing the technique itself). 
Perhaps most importantly, the impact on artist productivity must be taken into ac¬ 
count. Most graphics techniques use assets created by game artists, who comprise 
by far the largest part of most modern game teams. The graphics programmer 
must foster the artist’s productivity and creativity, which will ultimately deter¬ 
mine the visual quality of the game. 


26.3 Optimization Techniques 

Making wise use of these limited resources is the primary challenge of the game 
graphics programmer. To this end, various optimization techniques are commonly 
employed. 

In many games, pixel shader processing is a primary bottleneck. Most GPUs 
contain hierarchical depth-culling hardware which can avoid executing pixel 
shaders on occluded surfaces. To make good use of this hardware, opaque ob¬ 
jects can be rendered back-to-front. Alternatively, optimal depth-culling usage 
can be achieved by performing a depth pre-pass, i.e., rendering all the opaque 
objects into the depth buffer (without any color output or pixel shaders) before 
rendering the scene normally. This does incur some overhead (due to the need to 
render every object twice), but in many cases the performance gain is worth it. 

The fastest way to render an object is to not render it at all; thus any method 
of discerning early on that an object is occluded can be useful. This saves not 
only pixel processing but also vertex processing and even CPU time that would 
be spent submitting the object to the graphics API. View frustum culling (see 
Section 8.4.1) is universally employed, but in many games it is not sufficient. 
High-level occlusion culling algorithms are often used, utilizing data structures 
such as PVS (potentially visible sets) or BSP (binary spatial partitioning) trees to 
quickly narrow down the pool of potentially visible objects. 

Even if an object is visible, it may be at such a distance that most of its detail 
can be removed without apparent effect. LOD (level-of-detail) algorithms render 
different representations of an object based on distance (or other factors, such 
as screen coverage or importance). This can save significant processing, vertex 
processing in particular. Examples can be seen in Figure 26.1. 

In many cases, processing can be performed before the game even starts. The 
results of such preprocessing can be stored and used each frame, thus speeding 
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Figure 26.1. Two examples of game objects at a varying level of detail. The small inset 
images show the relative sizes at which the simplified models might be used. Upper row of 
images courtesy Crytek; lower row courtesy Valve Corp. 


up the game. This is most commonly employed for lighting, where global illumi¬ 
nation algorithms are utilized to compute lighting throughout the scene and store 
it in lightmaps and other data structures for later use. 


26.4 Game Types 

Since game requirements vary widely, the selection of graphics techniques is 
driven by the exact type of game being developed. 

The allocation of processing time depends strongly on the frame rate. Cur¬ 
rently, most console games tend to target 30 frames per second, since this enables 
much higher graphics quality. However, certain game types with fast gameplay 
require very low latency, and such games typically render at 60 frames per second. 
This includes music games such as Guitcir Hero and first-person shooters such as 
Call of Duty. 

The frame rate determines the available time to render the scene. The compo¬ 
sition of the scene itself also varies widely from game to game. Most games have 
a division between background geometry (scenery, mostly static) and foreground 
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geometry (characters and dynamic objects). These are handled differently by the 
rendering engine. For example, background geometry will often have lightmaps 
containing precomputed lighting, which is not feasible for foreground objects. 
Precomputed lighting is typically applied to foreground objects via some type of 
volumetric representation which can take account of the changing position of each 
object over time. 

Some games have relatively enclosed environments, where the camera re¬ 
mains largely in place. The purest examples are fighting games such as the Street 
Fighter series, but this is also true to some extent for games such as Devil May Cry 
and God of War. These games have cameras that are not under direct player con¬ 
trol, and the game play tends to move from one enclosed environment to another, 
spending a significant amount of playing time in each. This allows the game de¬ 
veloper to lavish large amounts of resources (processing, storage, and artist time) 
on each room or enclosed environment, resulting in very high levels of graphics 
fidelity. 

Other games have extremely large worlds, where the player can move about 
freely. This is most true for “sandbox games” such as the Grand Theft Auto series 
and online role-playing games such as World ofWarcraft. Such games pose great 
challenges to the graphics developer, since resource allocation is very difficult 
when during each frame the player can see a large extent of the world. Further 
complicating things, the player can freely go to some formerly distant part of the 
world and observe it from up close. Such games typically have changing time of 
day, which makes precomputation of lighting difficult at best, if not impossible. 

Most games, such as first-person shooters, are somewhere between the two 
extremes. The player can see a fair amount of scenery each frame, but movement 
through the game world is somewhat constrained. Many games also have a fixed 
time of day for each game level, for ease of lighting precomputation. 

The number of foreground objects rendered also varies widely between game 
types. Real-time strategy games such as the Command and Conquer series often 
have many dozens, if not hundreds, of units visible on screen. Other types of 
games have more limited quantities of visible characters, with fighting games 
at the opposite extreme, where only two characters are visible, each rendered 
with extremely high detail. A distinction must be drawn between the number of 
characters visible at any time (which affects budgeting of processing time) and 
the number of unique characters which can potentially be visible at short notice 
(which affects storage budgets). 

The type or genre of game also determines audience expectations of the graph¬ 
ics. For example, first-person shooters have historically had very high levels of 
graphics fidelity, and this expectation drives the graphics design when developing 
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Figure 26.2. Crysis exemplifies the realistic and detailed graphics expected of first-person 
shooters. Image courtesy Crytek. (See also Plate XXXIII.) 



Figure 26.3. An example of highly stylized, non-photorealistic rendering from the game 
Okami. Image courtesy Capcom Entertainment, Inc. (See also Plate XXXIV.) 











26.4. Game Types 


663 


new games in that genre; see Figure 26.2. On the other hand, puzzle games have 
typically had relatively simplistic graphics, so most game developers will not in¬ 
vest large amounts of programming or art resources into developing photorealistic 
graphics for such games. 

Although most games aim for a photorealistic look, a few do attempt more 
stylized rendering. One interesting example of this is Okami, which can be seen 
in Figure 26.3. 

The management of development resources also differs by game type. Most 
games have a closed development cycle of one to two years, which ends after 
the game ships. Recently it has become common to have downloadable content 
(DLC), which can be purchased after the game ships, so some development re¬ 
sources need to be reserved for that. Persistent-world online games have a never- 
ending development process where new content is continually being generated, 
at least as long as the game is economically viable (which may be a period of 
decades). 

The creative exploitation of the specific requirements and restrictions of a 
particular game is the hallmark of a skilled game graphics programmer. A good 
example is the game LittleBigPlanet, which has a “two-and-a-half-dimensional” 
game world comprising a small number of two-dimensional layers, as well as a 



Figure 26.4. The LittleBigPlanet developers took care to choose techniques that fit the 
game’s constraints, combining them in unusual ways to achieve stunning results. LittleBig¬ 
Planet © 2007 Sony Computer Entertainment Europe. Developed by Media Molecule. 
LittleBigPlanet is a trademark of Sony Computer Entertainment Europe. (See also Plate 
XXXV.) 
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non-interactive background. The graphics quality of this game is excellent, driven 
by the use of unusual rendering techniques specialized to this type of environ¬ 
ment; see Figure 26.4. 


26.5 The Game Production Process 

The game production process starts with the basic game design or concept. In 
some cases (such as sequels), the basic gameplay and visual design is clear, and 
only incremental changes are made. In the case of a new game type, extensive 
prototyping is needed to determine gameplay and design. Most cases sit some¬ 
where in the middle, where there are some new gameplay elements and the visual 
design is somewhat open. After this step there may be a greenlight stage where 
some early demo or concept is shown to the game publisher to get approval (and 
funding!) for the game. 

The next step is typically pre-production. While other teams are working 
on finishing up the last game, a small core team works on making any needed 
changes to the game engine and production tool chain, as well as working out the 
rough details of any new gameplay elements. This core team is working under a 
strict deadline. After the existing game ships and the rest of the team comes back 
from a well-deserved vacation, the entire tool chain and engine must be ready for 
them. If the core team misses this deadline, several dozen developers may be left 
idle—an extremely expensive proposition! 

Full production is the next step, with the entire team creating art assets, de¬ 
signing levels, tweaking gameplay, and implementing further changes to the game 
engine. In a perfect world, everything done during this process would be used in 
the final game, but in reality there is an iterative nature to game development 
which will result in some work being thrown out and redone. The goal is to min¬ 
imize this with careful planning and prototyping. 

When the game is functionally complete, the final stage begins. The term 
alpha release usually refers to the version which marks the start of extensive 
internal testing, beta release to the one which marks the start of extensive external 
testing, and gold release to the final release submitted to the console manufacturer, 
but different companies have slightly varying definitions of these terms. In any 
case, testing, or quality assurance (QA) is an important part of this phase, and it 
involves testers at the game development studio, at the publisher, at the console 
manufacturer, and possibly external QA contractors as well. These various rounds 
of testing result in bug reports which are submitted back to the game developers 
and worked on until the next release. 
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After the game ships, most of the developers go on vacation for a while, but 
a small team may have to stay to work on patches or downloadable content. In 
the meantime, a small core team has been working on pre-production for the next 
game ... 

Art asset creation is an aspect of game production that is particularly relevant 
to graphics development, so I will go into it in some detail. 


26.5.1 Asset Creation 

While the exact process of art asset creation varies from game to game, the outline 
I give here is fairly representative. In the past, a single artist would create an 
entire asset from start to finish, but this process is now much more specialized, 
involving people with different skill sets working on each asset at various times. 
Some of these stages have clear dependencies (for example, a character cannot be 
animated until it is rigged and cannot be rigged before it is modeled). Most game 
developers have well-defined approval processes, where the art director or a lead 
artist signs off on each stage before the asset is sent on to the next. Ideally an 
asset proceeds through each stage exactly once, but in practice changes may be 
made that require resubmission. 



Figure 26.5. A mesh being modeled in Maya, with associated texture parameterization. 
Image courtesy Keith Bruns. 
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Initial Modeling 

Typically the art asset creation process starts by modeling the object geometry. 
This step is performed in a general-purpose modeling package such as Maya, 
MAX or Softimage. The modeled geometry will be passed directly to the game 
engine, so it is important to minimize vertex count while preserving good sil¬ 
houettes. Character meshes must also be constructed so as to be amenable to 
animation. 

In this stage, a two-dimensional surface parameterization for textures is usu¬ 
ally created. It is important that this parameterization be highly continuous, since 
discontinuities require vertex duplication and may cause filtering artifacts. An 
example of a mesh with its associated texture parameterization is shown in Fig¬ 
ure 26.5. 

Texturing 

In the past texturing was a straightforward process of painting a color texture, typ¬ 
ically in Photoshop. Now specialized detail modeling packages such as ZBrush 
or Mudbox are commonly used to sculpt fine surface detail. Figures 26.6 and 26.7 
show an example of this process. 



Figure 26.6. The mesh from Figure 26.5 has been brought into ZBrush for detail modeling. 
Image courtesy Keith Bruns. 
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Figure 26.7. The mesh from Figure 26.6, with fine detail added to it in ZBrush. Image 
courtesy Keith Bruns. 



Figure 26.8. A visualization (in ZBrush) of the mesh from Figure 26.6, rendered with a 
normal map derived from the detailed mesh in Figure 26.7. The bottom of the figure shows 
the interface for ZBrush’s “Zmapper” tool, which was used to derive the normal map. Image 
courtesy Keith Bruns. 
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Figure 26.9. The normal map used in Figure 26.8. In this image, the red, green and blue 
channels of the texture contain the X, Y, and Z coordinates of the surface normals. Image 
courtesy Keith Bruns. (See also Plate XXXVI.) 



Figure 26.10. An early version of a diffuse color texture for the mesh from Figure 26.8, 
shown in Photoshop. Image courtesy Keith Bruns. (See also Plate XXXVII.) 
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Figure 26.11 . A rendering (in ZBrush) of the mesh with normal map and early diffuse color 
texture (from Figure 26.10) applied. Image courtesy Keith Bruns. (See also Plate XXXVIII.) 



Figure 26.12. Final version of the color texture from Figure 26.10. Image courtesy Keith 
Bruns. (See also Plate XXXIX.) 
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Figure 26.13. Rendering of the mesh with normal map and final color texture (from Fig¬ 
ure 26.12) applied. Image courtesy Keith Bruns. (See also Plate XL.) 


If this additional detail were to be represented with actual geometry, millions 
of triangles would be needed. Instead, the detail is commonly “baked” into a nor¬ 
mal map which is applied onto the original, coarse mesh, as shown in Figures 26.8 
and 26.9. 

Besides normal maps, multiple textures containing surface properties such as 
diffuse color, specular color, and smoothness (specular power) are also created. 
These are either painted directly on the surface in the detail modeling application, 
or in a two-dimensional application such as Photoshop. All of these texture maps 
use the surface parameterization defined in the initial modeling phase. When the 
texture is painted in a two-dimensional painting application, the artist must fre¬ 
quently switch between the painting application and some other application which 
can show a three-dimensional rendering of the object with the texture applied. 
This iterative process is illustrated in Figures 26.10,26.11,26.12, and 26.13. 


Shading 

Shaders are typically applied in the same application used for initial modeling. In 
this process, a shader (from the set of shaders defined for that game) is applied 
to the mesh. The various textures resulting from the detail modeling stage are 
applied as inputs to this shader, using the surface parameterization defined during 
initial modeling. Various other shader inputs are set via visual experimentation 
(“tweaking”); see Figure 26.14. 
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Figure 26.14. Shader configuration in Maya. The interface on the right is used to select 
the shader, assign textures to shader inputs, and set the values of non-texture shader inputs 
(such as the “Specular Color” and “Specular Power” sliders). The rendering on the left is up¬ 
dated dynamically while these properties are modified, enabling immediate visual feedback. 
Image courtesy Keith Bruns. (See also Plate XLI.) 


Lighting 

In the case of background scenery, lighting artists will typically start their work 
after modeling, texturing, and shading has been completed. Light sources are 
placed and their effect computed in a pre-processing step. The results of this 
process are stored in lightmaps for later use by the rendering engine. 

Animation 

Character meshes undergo several additional steps related to animation. The pri¬ 
mary method used to animate game characters is skinning. This requires a rig, 
consisting of a hierarchy of transform nodes that is attached to the character, a 
process known as rigging. The area of effect of each transform node is painted 
onto a subset of mesh vertices. Finally, animators create animations that move, 
rotate, and scale these transform nodes, “dragging” the mesh behind them. 

A typical game character will have many dozens of animations, correspond¬ 
ing to different modes of motion (walking, running, turning) as well as different 
actions such as attacks. In the case of a main character, the number of animations 
can be in the hundreds. Transitions between different animations also need to be 
defined. 
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Figure 26.15. Morph target interface in Maya. The bottom row shows four different morph 
targets, and the model at the top shows the effects of combining several morph targets 
together. The interface at the upper left is used to control the degree to which each morph 
target is applied. Image courtesy Keith Bruns. 


For facial animation, another technique, called morph targets is sometimes 
employed. In this technique, the mesh vertices are directly manipulated to deform 
the mesh. Different copies of the deformed mesh are stored (e.g., for different 
facial expressions) and combined by the game engine at runtime. The creation of 
morph targets is shown in Figure 26.15. 


Notes 

There is a huge amount of information on real-time rendering and game pro¬ 
gramming available, both in books and online. Here are some resources I can 
recommend from personal familiarity: 

Game Developer Magazine is a good source of information on game develop¬ 
ment, as are slides from the talks given at the annual Game Developers Confer¬ 
ence (GDC) and Microsoft’s Gamefest conference. The GPU Gems and ShaderX 
book series also contain good information—all of the former and the first two of 
the latter are also available online. 

Eric Lengyel’s Mathematics for 3D Game Programming & Computer Graph¬ 
ics, now in its second edition, is a good reference for the various types of math 
used in graphics and games. A specific area of game programming that is closely 
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related to graphics is collision detection, for which Christer Ericson’s Real-Time 
Collision Detection is the definitive resource. 

Since its first edition in 1999, Eric Haines and Tomas Akenine-Moller’s Real- 
Time Rendering has endeavored to cover this fast-growing field in a thorough 
manner. As a longtime fan of this book, I was glad to have the opportunity to be 
a coauthor on the third edition, which came out in mid-2008. 

Reading is not enough—make sure you play a variety of games regularly to 
get a good idea of the requirements of various game types, as well as the current 
state of the art. 


Exercises 

1. Examine the visuals of two dissimilar games. What differences can you 
deduce in the graphics requirements of these two games? Analyze the effect 
on rendering time, storage budgets, etc. 



Tamara Munzner 
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Visualization 


A major application area of computer graphics is visualization , where computer¬ 
generated images are used to help people understand both spatial and non-spatial 
data. Visualization is used when the goal is to augment human capabilities in 
situations where the problem is not sufficiently well defined for a computer to 
handle algorithmically. If a totally automatic solution can completely replace hu¬ 
man judgement, then visualization is not typically required. Visualization can be 
used to generate new hypotheses when exploring a completely unfamiliar dataset, 
to confirm existing hypotheses in a partially understood dataset, or to present in¬ 
formation about a known dataset to another audience. 

Visualization allows people to offload cognition to the perceptual system, us¬ 
ing carefully designed images as a form of external memory. The human visual 
system is a very high-bandwidth channel to the brain, with a significant amount 
of processing occurring in parallel and at the pre-conscious level. We can thus 
use external images as a substitute for keeping track of things inside our own 
heads. For an example, let us consider the task of understanding the relationships 
between a subset of the topics in the splendid book Godel, Escher, Bach: The 
Eternal Golden Braid (Hofstadter, 1979); see Figure 27.1. 

When we see the dataset as a text list, at the low level we must read words 
and compare them to memories of previously read words. It is hard to keep track 
of just these dozen topics using cognition and memory alone, let alone the hun¬ 
dreds of topics in the full book. The higher-level problem of identifying neigh¬ 
borhoods , for instance finding all the topics two hops away from the target topic 
Paradoxes, is very difficult. 
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Infinity - Lewis Carroll 
Infinity - Zeno 
Infinity - Paradoxes 
Infinity - Halting problem 
Zeno - Lewis Carroll 
Paradoxes - Lewis Carroll 
Paradoxes - Epimenides 
Paradoxes - Self-ref 


Epimenides - Self-ref 

Epimenides - Tarski 

Tarski - Epimenides 

Halting problem - Decision procedures 

Halting problem - Turing 

Lewis Carroll - Wordplay 

Tarski - Truth vs. provability 

Tarski - Undecidability 


Figure 27.1. Keeping track of relationships between topics is difficult using a text list. 


Figure 27.2 shows an external visual representation of the same dataset as a 
node-link graph, where each topic is a node and the linkage between two topics 
is shown directly with a line. Following the lines by moving our eyes around the 
image is a fast low-level operation with minimal cognitive load, so higher-level 
neighborhood finding becomes possible. The placement of the nodes and the 
routing of the links between them was created automatically by the dot graph 
drawing program (Gansner et al., 1993). 

We call the mapping of dataset attributes to a visual representation a visual 
encoding. One of the central problems in visualization is choosing appropriate 
encodings from the enormous space of possibile visual representations, taking 
into account the characteristics of the human perceptual system, the dataset in 
question, and the task at hand. 



Figure 27.2. Substituting perception for cognition and memory allows us to understand 
relationships between book topics quickly. 
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27.1 Background 

27.1.1 History 

People have a long history of conveying meaning through static images, dating 
back to the oldest known cave paintings from over thirty thousand years ago. We 
continue to visually communicate today in ways ranging from rough sketches on 
the back of a napkin to the slick graphic design of advertisements. For thousands 
of years, cartographers have studied the problem of making maps that represent 
some aspect of the world around us. The first visual representations of abstract, 
nonspatial datasets were created in the 18th century by William Playfair (Friendly, 
2008). 

Although we have had the power to create moving images for over one hun¬ 
dred and fifty years, creating dynamic images interactively is a more recent de¬ 
velopment only made possible by the widespread availability of fast computer 
graphics hardware and algorithms in the past few decades. Static visualizations 
of tiny datasets can be created by hand, but computer graphics enables interactive 
visualization of large datasets. 


27.1.2 Resource Limitations 

When designing a visualization system, we must consider three different kinds 
of limitations: computational capacity, human perceptual and cognitive capacity, 
and display capacity. 

As with any application of computer graphics, computer time and memory are 
limited resources and we often have hard constraints. If the visualization system 
needs to deliver interactive response, then it must use algorithms that can run in a 
fraction of a second rather than minutes or hours. 

On the human side, memory and attention must be considered as finite re¬ 
sources. Human memory is notoriously limited, both for long-term recall and 
for shorter-term working memory. Later in this chapter, we discuss some of the 
power and limitations of the low-level visual attention mechanisms that carry out 
massively parallel processing of the visual field. We store surprisingly little in¬ 
formation internally in visual working memory, leaving us vulnerable to change 
blindness , the phenomenon where even very large changes are not noticed if we 
are attending to something else in our view (Simons, 2000). Moreover, vigi¬ 
lance is also a highly limited resource; our ability to perform visual search tasks 
degrades quickly, with far worse results after several hours than in the first few 
minutes (Ware, 2000). 
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Display capacity is a third kind of limitation to consider. Visualization de¬ 
signers often “run out of pixels,” where the resolution of the screen is not large 
enough to show all desired information simultaneously. The information density 
of a particular frame is a measure of the amount of information encoded versus 
the amount of unused space. There is a tradeoff between the benefits of showing 
as much as possible at once, to minimize the need for navigation and exploration, 
and the costs of showing too much at once, where the user is overwhelmed by 
visual clutter. 

27.2 Data Types 

Many aspects of a visualization design are driven by the type of the data that we 
need to look at. For example, is it a table of numbers, or a set of relations between 
items, or inherently spatial data such as a location on the Earth’s surface or a 
collection of documents? 

We start by considering a table of data. We call the rows items of data and the 
columns are dimensions, also known as attributes. For example, the rows might 
represent people, and the columns might be names, age, height, shirt size, and 
favorite fruit. 

We distinguish between three types of dimensions: quantitative, ordered, and 
categorical. Quantitative data, such as age or height, is numerical and we can 
do arithmetic on it. For example, the quantity of 68 inches minus 42 inches is 
26 inches. With ordered data, such as shirt size, we cannot do full-fledged arith¬ 
metic, but there is a well-defined ordering. For example. Large minus Medium 
is not a meaningful concept, but we know that Medium falls between Small and 
Large. Categorical data, such as favorite fruit or names, does not have an implicit 
ordering. We can only distinguish whether two things are the same (apples) or 
different (apples vs. bananas). 

Relational data, or graphs, are another data type where nodes are connected by 
links. One specific kind of graph is a tree, which is typically used for hierarchical 
data. Both nodes and edges can have associated attributes. The word graph is 
unfortunately overloaded in visualization. The node-link graphs we discuss here, 
following the terminology of graph drawing and graph theory, could also be called 
networks. In the field of statistical graphics, graph is often used for chart, as in 
the line charts for time-series data shown in Figure 27.10. 

Some data is inherently spatial, such as geographic location or a field of mea¬ 
surements at positions in three-dimensional space as in the MR1 or CT scans used 
by doctors to see the internal structure of a person’s body. The information as¬ 
sociated with each point in space may be an unordered set of scalar quantities, 
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or indexed vectors, or tensors. In contrast, non-spatial data can be visually en¬ 
coded using spatial position, but that encoding is chosen by the designer rather 
than given implicitly in the semantics of the dataset itself. This choice is the one 
of the most central and difficult problems of visualization design. 


27.2.1 Dimension and Item Count 

The number of data dimensions that need to be visually encoded is one of the most 
fundamental aspects of the visualization design problem. Techniques that work 
for a low-dimensional dataset with a few columns will often fail for very high¬ 
dimensional datasets with dozens or hundreds of columns. A data dimension may 
have hierarchical structure, for example with a time series dataset where there are 
interesting patterns at multiple temporal scales. 

The number of data items is also important: a visualization that performs well 
for a few hundred items often does not scale to millions of items. In some cases 
the difficulty is purely algorithmic, where a computation would take too long; in 
others it is an even deeper perceptual problem that even an instantaneous algo¬ 
rithm could not solve, where visual clutter makes the representation unusable by 
a person. The range of possible values within a dimension may also be relevant. 


27.2.2 Data Transformation and Derived Dimensions 

Data is often transformed from one type to another as part of a visualization 
pipeline for solving the domain problem. For example, an original data dimen¬ 
sion might be made up of quantitative data: floating point numbers that represent 
temperature. For some tasks, like finding anomalies in local weather patterns, the 
raw data might be used directly. For another task, like deciding whether water is 
an appropriate temperature for a shower, the data might be transformed into an 
ordered dimension: hot, warm, or cold. In this transformation, most of the detail 
is aggregated away. In a third example, when making toast, an even more lossy 
transformation into a categorical dimension might suffice: burned or not burned. 

The principle of transforming data into derived dimensions, rather than simply 
visually encoding the data in its original form, is a powerful idea. In Figure 27.10, 
the original data was an ordered collection of time-series curves. The transforma¬ 
tion was to cluster the data, reducing the amount of information to visually encode 
to a few highly meaningful curves. 
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data/operation abstraction design r - - 


encoding/interaction technique design ^_ 
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Figure 27.3. Four nested layers of validation for visualization. 


27.3 Human-Centered Design Process 

The visualization design process can be split into a cascading set of layers, as 
shown in Figure 27.3. These layers all depend on each other; the output of the 
level above is input into the level below. 


27.3.1 Task Characterization 

A given dataset has many possible visual encodings. Choosing which visual en¬ 
coding to use can be guided by the specific needs of some intended user. Different 
questions, or tasks, require very different visual encodings. For example, consider 
the domain of software engineering. The task of understanding the coverage of a 
test suite is well supported by the Tarantula interface shown in Figure 27.11. How¬ 
ever, the task of understanding the modular decomposition of the software while 
refactoring the code might be better served by showing its hierarchical structure 
more directly as a node-link graph. 

Understanding the requirements of some target audience is a tricky problem. 
In a human-centered design approach, the visualization designer works with a 
group of target users over time (C. Lewis & Rieman, 1993). In most cases, users 
know they need to somehow view their data but cannot directly articulate their 
needs as clear-cut tasks in terms of operations on data types. The iterative design 
process includes gathering information from the target users about their problems 
through interviews and observation of them at work, creating prototypes, and 
observing how users interact with those prototypes to see how well the proposed 
solution actually works. The software engineering methodology of requirements 
analysis can also be useful (Kovitz, 1999). 


27.3.2 Abstraction 

After the specific domain problem has been identified in the first layer, the next 
layer requires abstracting it into a more generic representation as operations on 
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the data types discussed in the previous section. Problems from very different 
domains can map to the same visualization abstraction. These generic operations 
include sorting, filtering, characterizing trends and distributions, finding anoma¬ 
lies and outliers, and finding correlation (Amar et al., 2005). They also include 
operations that are specific to a particular data type, for example following a path 
for relational data in the form of graphs or trees. 

This abstraction step often involves data transformations from the original raw 
data into derived dimensions. These derived dimensions are often of a different 
type than the original data: a graph may be converted into a tree, tabular data may 
be converted into a graph by using a threshold to decide whether a link should 
exist based on the field values, and so on. 

27.3.3 Technique and Algorithm Design 

Once an abstraction has been chosen, the next layer is to design appropriate visual 
encoding and interaction techniques. Section 27.4 covers the principles of visual 
encoding, and we discuss interaction principles in Sections 27.5. We present 
techniques that take these principles into account in Sections 27.6 and 27.7. 

A detailed discussion of visualization algorithms is unfortunately beyond the 
scope of this chapter. 

27.3.4 Validation 

Each of the four layers has different validation requirements. 

The first layer is designed to determine whether the problem is correctly char¬ 
acterized: is there really a target audience performing particular tasks that would 
benefit from the proposed tool? An immediate way to test assumptions and con¬ 
jectures is to observe or interview members of the target audience, to ensure that 
the visualization designer fully understands their tasks. A measurement that can¬ 
not be done until a tool has been built and deployed is to monitor its adoption 
rate within that community, although of course many other factors in addition to 
utility affect adoption. 

The next layer is used to determine whether the abstraction from the domain 
problem into operations on specific data types actually solves the desired problem. 
After a prototype or finished tool has been deployed, a held study can be carried 
out to observe whether and how it is used by its intended audience. Also, images 
produced by the system can be analyzed both qualitatively and quantitatively. 

The purpose of the third layer is to verify that the visual encoding and in¬ 
teraction techniques chosen by the designer effectively communicate the chosen 
abstraction to the users. An immediate test is to justify that individual design 
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choices do not violate known perceptual and cognitive principles. Such a justi¬ 
fication is necessary but not sufficient, since visualization design involves many 
tradeoffs between interacting choices. After a system is built, it can be tested 
through formal laboratory studies where many people are asked to do assigned 
tasks so that measurements of the time required for them to complete the tasks 
and their error rates can be statistically analyzed. 

A fourth layer is employed to verify that the algorithm designed to carry out 
the encoding and interaction choices is faster or takes less memory than previous 
algorithms. An immediate test is to analyze the computational complexity of 
the proposed algorithm. After implementation, the actual time performance and 
memory usage of the system can be directly measured. 


27.4 Visual Encoding Principles 

We can describe visual encodings as graphical elements, called marks, that con¬ 
vey information through visual channels. A zero-dimensional mark is a point, a 
one-dimensional mark is a line, a two-dimensional mark is an area, and a three- 
dimensional mark is a volume. Many visual channels can encode information, 
including spatial position, color, size, shape, orientation, and direction of mo¬ 
tion. Multiple visual channels can be used to simultaneously encode different 
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Figure 27.4. The four visual channels of horizontal and vertical spatial position, color, 
and size are used to encode information in this scatterplot chart Image courtesy George 
Robertson (Robertson et al., 2008), © IEEE 2008. 
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data dimensions; for example. Figure 27.4 shows the use of horizontal and ver¬ 
tical spatial position, color, and size to display four data dimensions. More than 
one channel can be used to redundantly code the same dimension, for a design 
that displays less information but shows it more clearly. 


27.4.1 Visual Channel Characteristics 

Important characteristics of visual channels are distinguishability, separability, 
and popout. 

Channels are not all equally distinguishable. Many psychophysical experi¬ 
ments have been carried out to measure the ability of people to make precise 
distinctions about information encoded by the different visual channels. Our 
abilities depend on whether the data type is quantitative, ordered, or categorical. 
Figure 27.5 shows the rankings of visual channels for the three data types. Fig¬ 
ure 27.6 shows some of the default mappings for visual channels in the 
Tableau/Polaris system, which take into account the data type. 

Spatial position is the most accurate visual channel for all three types of data, 
and it dominates our perception of a visual encoding. Thus, the two most impor¬ 
tant data dimensions are often mapped to horizontal and vertical spatial positions. 

However, the other channels differ strongly between types. The channels of 
length and angle are highly discriminable for quantitative data but poor for or¬ 
dered and categorical, while in contrast hue is very accurate for categorical data 
but mediocre for quantitative data. 

We must always consider whether there is a good match between the dynamic 
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Figure 27.5. Our ability to perceive information encoded by a visual channel depends on 
the type of data used, from most accurate at the top to least at the bottom. Redrawn and 
adapted from (Mackinlay, 1986). 
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Figure 27.7. Color and lo¬ 
cation are separable chan¬ 
nels well suited to encode 
different data dimensions, 
but the horizontal size and 
and vertical size channels 
are automatically fused into 
an integrated perception of 
area. Redrawn after (Ware, 
2000 ). 
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Figure 27.6. The Tableau/Polaris system default mappings for four visual channels accord¬ 
ing to data type. Image courtesy Chris Stolte (Stolte et al., 2008), © 2008 IEEE. (See also 
Plate XLII.) 

range necessary to show the data dimension and the dynamic range available in the 
channel. For example, encoding with line width uses a one-dimensional mark and 
the size channel. There are a limited number of width steps that we can reliably 
use to visually encode information: a minimum thinness of one pixel is enforced 
by the screen resolution (ignoring antialiasing to simplify this discussion), and 
there is a maximum thickness beyond which the object will be perceived as a 
polygon rather than a line. Line width can work very well to show three or four 
different values in a data dimension, but it would be a poor choice for dozens or 
hundreds of values. 

Some visual channels are integral, fused together at a pre-conscious level, so 
they are not good choices for visually encoding different data dimensions. Others 
are separable, without interactions between them during visual processing, and 
are safe to use for encoding multiple dimensions. Figure 27.7 shows two channel 
pairs. Color and position are highly separable. We can see that horizontal size and 
vertical size are not so easy to separate, because our visual system automatically 
integrates these together into a unified perception of area. Size interacts with 
many channels: as the size of an object grows smaller, it becomes more difficult 
to distinguish its shape or color. 

We can selectively attend to a channel so that items of a particular type “pop 
out” visually, as discussed in Section 22.4.3. An example of visual popout is 
when we immediately spot the red item amidst a sea of blue ones, or distinguish 
the circle from the squares. Visual popout is powerful and scalable because it 
occurs in parallel, without the need for conscious processing of the items one 
by one. Many visual channels have this popout property, including not only the 
list above but also curvature, flicker, stereoscopic depth, and even the direction 
of lighting. However, in general we can only take advantage of popout for one 
channel at a time. For example, a white circle does not pop out from a group of 
circles and squares that can be white or black, as shown in Figure 22.43. When we 
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need to search across more than one channel simultaneously, the length of time 
it takes to find the target object depends linearly on the number of objects in the 
scene. 


27.4.2 Color 

Color can be a very powerful channel, but many people do not understand its 
properties and use it improperly. As discussed in Section 22.2.2, we can consider 
color in terms of three separate visual channels: hue, saturation, and lightness. 
Region size strongly affects our ability to sense color. Color in small regions is 
relatively difficult to perceive, and designers should use bright, highly saturated 
colors to ensure that the color coding is distinguishable. The inverse situation 
is true when colored regions are large, as in backgrounds, where low saturation 
pastel colors should be used to avoid blinding the viewer. 

Hue is a very strong cue for encoding categorical data. However, the available 
dynamic range is very limited. People can reliably distinguish only around a 
dozen hues when the colored regions are small and scattered around the display. 
A good guideline for color coding is to keep the number of categories less than 8, 
keeping in mind that the background and the neutral object color also count in the 
total. 

For ordered data, lightness and saturation are effective because they have an 
implicit perceptual ordering. People can reliably order by lightness, always plac¬ 
ing gray in between black and white. With saturation, people reliably place the 
less saturated pink between fully saturated red and zero-saturation white. How¬ 
ever, hue is not as as good a channel for ordered data because it does not have 
an implicit perceptual ordering. When asked to create an ordering of red, blue, 
green, and yellow, people do not all give the same answer. People can and do learn 
conventions, such as green-yellow-red for traffic lights, or the order of colors in 
the rainbow, but these constructions are at a higher level than pure perception. 
Ordered data is typically shown with a discrete set of color values. 

Quantitative data is shown with a colormap, a range of color values that can 
be continuous or discrete. A very unfortunate default in many software packages 
is the rainbow colormap, as shown in Figure 27.8. The standard rainbow scale 
suffers from three problems. First, hue is used to indicate order. A better choice 
would be to use lightness because it has an implicit perceptual ordering. Even 
more importantly, the human eye responds most strongly to luminance. Second, 
the scale is not perceptually linear: equal steps in the continuous range are not 
perceived as equal steps by our eyes. Figure 27.8 shows an example, where the 
rainbow colormap obfuscates the data. While the range from —2000 to —1000 
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Figure 27.8. The standard rainbow colormap has two defects: it uses hue to denote 
ordering, and it is not perceptually isolinear. Image courtesy Bernice Rogowitz. (See also 
Plate XLIV. 

has three distinct colors, cyan, green, and yellow, a range of the same size from 
—1000 to 0 simply looks yellow throughout. The graphs on the right show that the 
perceived value is strongly tied to the luminance, which is not even monotonically 
increasing in this scale. 

In contrast, Figure 27.9 shows the same data with a more appropriate col¬ 
ormap, where the lightness increases monotonically. Hue is used to create a 
semantically meaningful categorization: the viewer can discuss structure in the 
dataset, such as the dark blue sea, the cyan continental shelf, the green lowlands, 
and the white mountains. 



Figure 27.9. The structure of the same dataset is far more clear with a colormap where 
monotonically increasing lightness is used to show ordering and hue is used instead for 
segmenting into categorical regions. Image courtesy Bernice Rogowitz. (See also Plate 
XLIV.) 
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In both the discrete and continuous cases, colormaps should take into account 
whether the data is sequential or diverging. The ColorBrewer application (www. 
colorbrewer.org) is an excellent resource for colormap construction (Brewer, 1999) 

Another important issue when encoding with color is that a significant fraction 
of the population, roughly 10% of men, is red-green color deficient. If a coding 
using red and green is chosen because of conventions in the target domain, re¬ 
dundantly coding lightness or saturation in addition to hue is wise. Tools such as 
the web site http://www.vischeck.com should be used to check whether a color 
scheme is distinguishable to people with color deficient vision. 

27.4.3 2D vs. 3D Spatial Layouts 

The question of whether to use two or three channels for spatial position has been 
extensively studied. When computer-based visualization began in the late 1980s, 
and interactive 3D graphics was a new capability, there was a lot of enthusiasm 
for 3D representations. As the field matured, researchers began to understand the 
costs of 3D approaches when used for abstract datasets (Ware, 2001). 

Occlusion, where some parts of the dataset are hidden behind others, is a 
major problem with 3D. Although hidden surface removal algorithms such as Z- 
buffers and BSP trees allow fast computation of a correct 2D image, people must 
still synthesize many of these images into an internal mental map. When peo¬ 
ple look at realistic scenes made from familiar objects, usually they can quickly 
understand what they see. However, when they see an unfamiliar dataset, where 
a chosen visual encoding maps abstract dimensions into spatial positions, under¬ 
standing the details of its 3D structure can be challenging even when they can use 
interactive navigation controls to change their 3D viewpoint. The reason is once 
again the limited capacity of human working memory (Plumlee & Ware, 2006). 

Another problem with 3D is perspective distortion. Although real-world ob¬ 
jects do indeed appear smaller when they are further from our eyes, foreshorten¬ 
ing makes direct comparison of object heights difficult (Tory et al., 2006). Once 
again, although we can often judge the heights of familiar objects in the real world 
based on past experience, we cannot necessarily do so with completely abstract 
data that has a visual encoding where the height conveys meaning. For exam¬ 
ple, it is more difficult to judge bar heights in a 3D bar chart than in multiple 
horizontally aligned 2D bar charts. 

Another problem with unconstrained 3D representations is that text at arbi¬ 
trary orientations in 3D space is far more difficult to read than text aligned in the 
2D image plane (Grossman et al., 2007). 

Figure 27.10 illustrates how carefully chosen 2D views of an abstract dataset 
can avoid the problems with occlusion and perspective distortion inherent in 3D 
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Figure 27.10. Left: A 3D representation of this time series dataset introduces the prob¬ 
lems of occlusion and perspective distortion. Right: The linked 2D views of derived aggre¬ 
gate curves and the calendar allow direct comparison and show more fine-grained patterns. 
Image courtesy Jarke van Wijk (van Wijk & van Selow, 1999), © 1999 IEEE. (See also 
Plate XLV.) 


views. The top view shows a 3D representation created directly from the origi¬ 
nal time-series data, where each cross-section is a 2D time-series curve showing 
power consumption for one day, with one curve for each day of the year along the 
extruded third axis. Although this representation is straightforward to create, we 
can only see large-scale patterns such as the higher consumption during working 
hours and the seasonal variation between winter and summer. To create the 2D 
linked views at the bottom, the curves were hierarchically clustered, and only ag¬ 
gregate curves representing the top clusters are drawn superimposed in the same 
2D frame. Direct comparison between the curve heights at all times of the day 
is easy because there is no perspective distortion or occlusion. The same color 
coding is used in the calendar view, which is very effective for understanding 
temporal patterns. 

In contrast, if a dataset consists of inherently 3D spatial data, such as showing 
fluid flow over an airplane wing or a medical imaging dataset from an MRI scan, 
then the costs of a 3D view are outweighed by its benefits in helping the user 
construct a useful mental model of the dataset structure. 

27.4.4 Text Labels 

Text in the form of labels and legends is a very important factor in creating visu¬ 
alizations that are useful rather than simply pretty. Axes and tick marks should be 
labelled. Legends should indicate the meaning of colors, whether used as discrete 
patches or in continuous color ramps. Individual items in a dataset typically have 
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meaningful text labels associated with them. In many cases showing all labels 
at all times would result in too much visual clutter, so labels can be shown for 
a subset of the items using label positioning algorithms that show labels at a de¬ 
sired density while avoiding overlap (Luboschik et al., 2008). A straightforward 
way to choose the best label to represent a group of items is to use a greedy algo¬ 
rithm based on some measure of label importance, but synthesizing a new label 
based on the characteristics of the group remains a difficult problem. A more 
interaction-centric approach is to only show labels for individual items based on 
an interactive indication from the user. 

27.5 Interaction Principles 

Several principles of interaction are important when designing a visualization. 
Low-latency visual feedback allows users to explore more fluidly, for example 
by showing more detail when the cursor simply hovers over an object rather than 
requiring the user to explicitly click. Selecting items is a fundamental operation 
when interacting with large datasets, as is visually indicating the selected set with 
highlighting. Color coding is a common form of highlighting, but other channels 
can also be used. 

Many forms of interaction can be considered in terms of what aspect of the 
display they change. Navigation can be considered a change of viewport. Sorting 
is a change to the spatial ordering; that is, changing how data is mapped to the 
spatial position visual channel. The entire visual encoding can also be changed. 

27.5.1 Overview First, Zoom and Filter, Details on Demand 

The influential mantra “Overview first, zoom and filter, details on demand” (Shnei- 
derman, 1996) elucidates the role of interaction and navigation in visualization 
design. Overviews help the user notice regions where further investigation might 
be productive, whether through spatial navigation or through filtering. As we dis¬ 
cuss below, details can be presented in many ways: with popups from clicking or 
cursor hovering, in a separate window, and by changing the layout on the fly to 
make room to show additional information. 


27.5.2 Interactivity Costs 

Interactivity has both power and cost. The benefit of interaction is that people can 
explore a larger information space than can be understood in a single static image. 
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However, a cost to interaction is that it requires human time and attention. If the 
user must exhaustively check every possibility, use of the visualization system 
may degenerate into human-powered search. Automatically detecting features 
of interest to explicitly bring to the user’s attention via the visual encoding is a 
useful goal for the visualization designer. However, if the task at hand could be 
completely solved by automatic means, there would be no need for a visualization 
in the first place. Thus, there is always a tradeoff between finding automatable 
aspects and relying on the human in the loop to detect patterns. 


27.5.3 Animation 

Animation shows change using time. We distinguish animation, where succes¬ 
sive frames can only be played, paused, or stopped, from true interactive control. 
There is considerable evidence that animated transitions can be more effective 
than jump cuts, by helping people track changes in object positions or camera 
viewpoints (Heer & Robertson, 2007). Although animation can be very effec¬ 
tive for narrative and storytelling, it is often used ineffectively in a visualization 
context (Tversky et al., 2002). It might seem obvious to show data that changes 
over time by using animation, a visual modality that changes over time. How¬ 
ever, people have difficulty in making specific comparisons between individual 
frames that are not contiguous when they see an animation consisting of many 
frames. The very limited capacity of human visual memory means that we are 
much worse at comparing memories of things that we have seen in the past than 
at comparing things that are in our current field of view. For tasks requiring com¬ 
parison between up to several dozen frames, side-by-side comparison is often 
more effective than animation. Moreover, if the number of objects that change 
between frames is large, people will have a hard time tracking everything that 
occurs (Robertson et al., 2008). Narrative animations are carefully designed to 
avoid having too many actions occurring simultaneously, whereas a dataset being 
visualized has no such constraint. For the special case of just two frames with a 
limited amount of change, the very simple animation of flipping back and forth 
between the two can be a useful way to identify the differences between them. 


27.6 Composite and Adjacent Views 

A very fundamental visual encoding choice is whether to have a single composite 
view showing everything in the same frame or window, or to have multiple views 
adjacent to each other. 
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27.6.1 Single Drawing 

When there are only one or two data dimensions to encode, then horizontal and 
vertical spatial position are the obvious visual channel to use, because we perceive 
them most accurately and position has the strongest influence on our internal men¬ 
tal model of the dataset. The traditional statistical graphics displays of line charts, 
bar charts, and scatterplots all use spatial ordering of marks to encode informa¬ 
tion. These displays can be augmented with additional visual channels, such as 
color and size and shape, as in the scatterplot shown in Figure 27.4. 

The simplest possible mark is a single pixel. In pixel-oriented displays, the 
goal is to provide an overview of as many items as possible. These approaches use 
the spatial position and color channels at a high information density, but preclude 
the use of the size and shape channels. Figure 27.11 shows the Tarantula software 
visualization tool (Jones et ah, 2002), where most of the screen is devoted to an 
overview of source code using one-pixel high lines (Eick et al., 1992). The color 
and brightness of each line shows whether it passed, failed, or had mixed results 
when executing a suite of test cases. 



Figure 27.11. Tarantula shows an overview of source code using one-pixel lines color 
coded by execution status of a software test suite. Image courtesy John Stasko (Jones et 
al., 2002). (See also Plate XLVI.) 
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Figure 27.12. Visual layering with size, saturation, and brightness in the Constellation 
system (Munzner, 2000). (See also Plate XLVII.) 


27.6.2 Superimposing and Layering 

Multiple items can be superimposed in the same frame when their spatial position 
is compatible. Several lines can be shown in the same line chart, and many dots in 
the same scatterplot, when the axes are shared across all items. One benefit of a 
single shared view is that comparing the position of different items is very easy. If 
the number of items in the dataset is limited, then a single view will often suffice. 
Visual layering can extend the usefulness of a single view when there are enough 
items that visual clutter becomes a concern. Figure 27.12 shows how a redundant 
combination of the size, saturation, and brightness channels serves to distinguish 
a foreground layer from a background layer when the user moves the cursor over 
a block of words. 


27.6.3 Glyphs 

We have been discussing the idea of visual encoding using simple marks, where 
a single mark can only have one value for each visual channel used. With more 
complex marks, which we will call glyphs , there is internal structure where sub- 
regions have different visual channel encodings. 
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Variations on Profile glyphs Stars and Anderson/metroglyphs Sticks and Trees 



Figure 27.13. Complex marks, which we call glyphs, have subsections that visually encode 
different data dimensions. Image courtesy Matt Ward (M. O. Ward, 2002). 


Designing appropriate glyphs has the same challenges as designing visual en¬ 
codings. Figure 27.13 shows a variety of glyphs, including the notorious faces 
originally proposed by Chernoff. The danger of using faces to show abstract data 
dimensions is that our perceptual and emotional response to different facial fea- 


Figure 27.14. Complex glyphs require significant display area so that the encoded informa¬ 
tion can be read. Image courtesy Matt Ward, created with the SpiralGlyphics software (M. O. 
Ward, 2002). (See also Plate XLIII.) 
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Figure 27.15. A dense array of simple glyphs. Image courtesy Georges Grinstein (S. Smith 
et al., 1991), © 1991 IEEE. 

tures is highly nonlinear in a way that is not fully understood, but the variability 
is greater than between the visual channels that we have discussed so far. We 
are probably far more attuned to features that indicate emotional state such as 
eyebrow orientation than other features such as nose size or face shape. 

Complex glyphs require significant display area for each glyph, as shown in 
Figure 27.14 where miniature bar charts show the value of four different dimen¬ 
sions at many points along a spiral path. Simpler glyphs can be used to create 
a global visual texture, the glyph size is so small that individual values cannot 
be read out without zooming but region boundaries can be discerned from the 
overview level. Figure 27.15 shows an example using stick figures of the kind in 
the upper right in Figure 27.13. Glyphs may be placed at regular intervals, or in 
data-driven spatial positions using an original or derived data dimension. 


27.6.4 Multiple Views 

We now turn from approaches with only a single frame to those which use mul¬ 
tiple views that are linked together. The most common form of linkage is linked 
highlighting, where items selected in one view are highlighted in all others. In 
linked navigation, movement in one view triggers movement in the others. 

There are many kinds of multiple-view approaches. In what is usually called 
simply the multiple-view approach, the same data is shown in several views, each 
of which has a different visual encoding that shows certain aspects of the dataset 



27.6. Composite and Adjacent Views 


695 


most clearly. The power of linked highlighting across multiple visual encodings 
is that items that fall in a contiguous region in one view are often distributed very 
differently in the other views. In the small-multiples approach, each view has 
the same visual encoding for different datasets, usually with shared axes between 
frames so that comparison of spatial position between them is meaningful. Side- 
by-side comparison with small multiples is an alternative to the visual clutter of 
superimposing all the data in the same view, and to the human memory limitations 
of remembering previously seen frames in an animation that changes over time. 

The overview-and-detail approach is to have the same data and the same visual 
encoding in two views, where the only difference between them is the level of 
zooming. In most cases, the overview uses much less display space than the 
detail view. The combination of overview and detail views is common outside 
of visualization in many tools ranging from mapping software to photo editing. 
With a detail-on-demand approach, another view shows more information about 
some selected item, either as a popup window near the cursor or in a permanent 
window in another part of the display. 
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Figure 27.16. The Improvise toolkit was used to create this multiple-view visualization. 
Image courtesy Chris Weaver. (See also Plate XLVIII.) 




















































696 


27. Visualization 


Determining the most appropriate spatial position of the views themselves 
with respect to each other can be as significant a problem as determining the 
spatial position of marks within a single view. In some systems, the location of the 
views is arbitrary and left up to the window system or the user. Aligning the views 
allows precise comparison between them, either vertically, horizontally, or with 
an array for both directions. Just as items can be sorted within a view, views can 
be sorted within a display, typically with respect to a derived variable measuring 
some aspect of the entire view as opposed to an individual item within it. 

Figure 27.16 shows a visualization of census data that uses many views. In 
addition to geographic information, the demographic information for each county 
includes population, density, gender, median age, percent change since 1990, 
and proportions of major ethnic groups. The visual encodings used include ge¬ 
ographic, scatterplot, parallel coordinate, tabular, and matrix views. The same 
color encoding is used across all the views, with a legend in the bottom mid¬ 
dle. The scatterplot matrix shows linked highlighting across all views, where 
the blue items are close together in some views and scattered in others. The 
map in the upper-left corner is an overview for the large detail map in the cen¬ 
ter. The tabular views allow direct sorting by and selection within a dimension 
of interest. 


27.7 Data Reduction 

The visual encoding techniques that we have discussed so far show all of the items 
in a dataset. However, many datasets are so large that showing everything simul¬ 
taneously would result in so much visual clutter that the visual representation 
would be difficult or impossible for a viewer to understand. The main strategies 
to reduce the amount of data shown are overviews and aggregation, filtering and 
navigation, the focus+context techniques, and dimensionality reduction. 


27.7.1 Overviews and Aggregation 

With tiny datasets, a visual encoding can easily show all data dimensions for all 
items. For datasets of medium size, an overview that shows information about 
all items can be constructed by showing less detail for each item. Many datasets 
have internal or derivable structure at multiple scales. In these cases, a multiscale 
visual representation can provide many levels of overview, rather than just a single 
level. Overviews are typically used as a starting point to give users clues about 
where to drill down to inspect in more detail. 
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For larger datasets, creating an overview requires some kind of visual sum¬ 
marization. One approach to data reduction is to use an aggregate representation 
where a single visual mark in the overview explicitly represents many items. 

The challenge of aggregation is to avoid eliminating the interesting signals 
in the dataset in the process of summarization. In the cartographic literature, the 
problem of creating maps at different scales while retaining the important dis¬ 
tinguishing characteristics has been extensively studied under the name of carto¬ 
graphic generalization (Slocum et ah, 2008). 


27.7.2 Filtering and Navigation 

Another approach to data reduction is to filter the data, showing only a subset of 
the items. Filtering is often carried out by directly selecting ranges of interest in 
one or more of the data dimensions. 

Navigation is a specific kind of filtering based on spatial position, where 
changing the viewpoint changes the visible set of items. Both geometric and non¬ 
geometric zooming are used in visualization. With geometric zooming, the cam¬ 
era position in 2D or 3D space can be changed with standard computer graphics 
controls. In a realistic scene, items should be drawn at a size that depends on their 
distance from the camera, and only their apparent size changes based on that dis¬ 
tance. However, in a visual encoding of an abstract space, nongeometric zooming 
can be useful. In semantic zooming, the visual appearance of an object changes 
dramatically based on the number of pixels available to draw it. For instance, an 
abstract visual representation of a text file could change from a tiny color-coded 
box with no label to a medium-sized box containing only the filename as a text 
label to a large rectangle containing a multi-line summary of the file contents. In 
realistic scenes, objects that are sufficiently far away from the camera are not vis¬ 
ible in the images, for example, after they subtend less than one pixel of screen 
area. With guaranteed visibility, one of the original or derived data dimensions is 
used as a measure of importance, and objects of sufficient importance must have 
some kind of representation visible in the image plane at all times. 


27.7.3 Focus+Context 

Focus+context techniques are another approach to data reduction. A subset of the 
dataset items are interactively chosen by the user to be the focus and are drawn 
in detail. The visual encoding also includes information about some or all of the 
rest of the dataset shown for context, integrated into the same view that shows the 
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focus items. Many of these techniques use carefully chosen distortion to combine 
magnified focus regions and minified context regions into a unified view. 

One common interaction metaphor is a moveable fisheye lens. Hyperbolic 
geometry provides an elegant mathematical framework for a single radial lens 
that affects all objects in the view. Another interaction metaphor is to use mul¬ 
tiple lenses of different shapes and magnification levels that affect only local re¬ 
gions. Stretch and squish navigation uses the interaction metaphor of a rubber 
sheet where stretching one region squishes the rest, as shown in Figure 27.17. 
The borders of the sheet stay fixed so that all items are within the viewport, al¬ 
though many items may be compressed to subpixel size. The fisheye metaphor 
is not limited to a geometric lens used after spatial layout; it can be used directly 
on structured data, such as a hierarchical document where some sections are col¬ 
lapsed while others are left expanded. 

These distortion-based approaches are another example of non-literal navi¬ 
gation in the same spirit as nongeometric zooming. When navigating within a 
large and unfamiliar dataset with realistic camera motion, users can become dis¬ 
oriented at high zoom levels when they can see only a small local region. These 
approaches are designed to provide more contextual information than a single 
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Figure 27.17. The TreeJuxtaposer system features stretch and squish navigation and guar¬ 
anteed visibility of regions marked with colors (Munzner et at, 2003). (See also Plate XLIX). 
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Figure 27.18. The SpaceTree system shows the path between the root and the interactively 
chosen focus node to provide context (Grosjean et al., 2002). 


undistorted view, in hopes that people can stay oriented if landmarks remain rec- 
ognizeable. However, these kinds of distortion can still be confusing or difficult 
to follow for users. The costs and benefits of distortion, as opposed to multiple 
views or a single realistic view, are not yet fully understood. Standard 3D per¬ 
spective is a particularly familiar kind of distortion and was explicitly used as a 
form of focus+context in early visualization work. However, as the costs of 3D 
spatial layout discussed in Section 27.4 became more understood, this approach 
became less popular. 

Other approaches to providing context around focus items do not require dis¬ 
tortion. For instance, the SpaceTree system shown in Figure 27.18 elides most 
nodes in the tree, showing the path between the interactively chosen focus node 
and the root of the tree for context. 

27.7.4 Dimensionality Reduction 

The data reduction approaches covered so far reduce the number of items to 
draw. When there are many data dimensions, dimensionality reduction can also be 
effective. 
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With slicing, a single value is chosen from the dimension to eliminate, and 
only the items matching that value for the dimension are extracted to include in 
the lower-dimensional slice. Slicing is particularly useful with 3D spatial data, for 
example when inspecting slices through a CT scan of a human head at different 
heights along the skull. Slicing can be used to eliminate multiple dimensions at 
once. 

With projection, no information about the eliminated dimensions is retained; 
the values for those dimensions are simply dropped, and all items are still shown. 
A familiar form of projection is the standard graphics perspective transformation 
which projects from 3D to 2D, losing information about depth along the way. In 
mathematical visualization, the structure of higher-dimensional geometric objects 
can be shown by projecting from 4D to 3D before the standard projection to the 
image plane and using color to encode information from the projected-away di¬ 
mension. This technique is sometimes called dimensional filtering when it is used 
for nonspatial data. 

In some datasets, there may be interesting hidden structure in a much lower¬ 
dimensional space than the number of original data dimensions. For instance, 
sometimes directly measuring the independent variables of interest is difficult or 
impossible, but a large set of dependent or indirect variables is available. The goal 
is to find a small set of dimensions that faithfully represent most of the structure or 
variance in the dataset. These dimensions may be the original ones, or synthesized 
new ones that are linear or nonlinear combinations of the originals. Principal com¬ 
ponent analysis is a fast, widely used linear method. Many nonlinear approaches 
have been proposed, including multidimensional scaling (MDS). These methods 
are usually used to determine whether there are large-scale clusters in the dataset; 



Figure 27.19. Dimensionality reduction with the Glimmer multidimensional scaling approach 
shows clusters in a document dataset (Ingram et al., 2009), © 2009 IEEE. (See also Plate L.) 
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the fine-grained structure in the lower-dimensional plots is usually not reliable 
because information is lost in the reduction. Figure 27.19 shows document col¬ 
lection in a single scatterplot. When the true dimensionality of the dataset is far 
higher than two, a matrix of scatterplots showing pairs of synthetic dimensions 
may be necessary. 


27.8 Examples 

We conclude this chapter with several examples of visualizing specific types of 
data using the techniques discussed above. 


27.8.1 Tables 

Tabular data is extremely common, as all spreadsheet users know. The goal 
in visualization is to encode this information through easily perceivable visual 
channels rather than forcing people to read through it as numbers and text. Fig¬ 
ure 27.20 shows the Table Lens, a focus+context approach where quantitative 


Table Lens: Baseball Player Statistics 


|" Hits" / "At Bats" = "Avg" 



Row 304: Mike Lavalliere, 


Column 20: Put Outs 


Figure 27.20. The Table Lens provides focus+context interaction with tabular data, immedi¬ 
ately reorderable by the values in each dimension column. Image courtesy Stuart Card (Rao 
& Card, 1994), © 1994 ACM, Inc. Included here by permission. 
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27. Visualization 



Figure 27.21. Hierarchical parallel coordinates show high-dimensional data at multiple 
levels of detail. Image courtesy Matt Ward (Fua et al., 1999), © 1999 IEEE. (See also 
Plate LI). 


values are encoded as the length of one-pixel high lines in the context regions, 
and shown as numbers in the focus regions. Each dimension of the dataset is 
shown as a column, and the rows of items can be resorted according to the values 
in that column with a single click in its header. 

The traditional Cartesian approach of a scatterplot, where items are plotted 
as dots with respect to perpendicular axes, is only usable for two and three di¬ 
mensions of data. Many tables contain far more than three dimensions of data, 
and the number of additional dimensions that can be encoded using other visual 
channels is limited. Parallel coordinates are an approach for visualizing more di¬ 
mensions at once using spatial position, where the axes are parallel rather than 
perpendicular and an n-dimensional item is shown as a polyline that crosses each 
of the n axes once (Inselberg & Dimsdale, 1990; Wegman, 1990). Figure 27.21 
shows an 8-dimensional dataset of 230,000 items at multiple levels of detail (Fua 
et al., 1999), from a high-level view at the top to finer detail at the bottom. With 
hierarchical parallel coordinates, the items are clustered and an entire cluster of 
items is represented by a band of varying width and opacity, where the mean is in 
the middle and width at each axis depends on the values of the items in the cluster 
in that dimension. The coloring of each band is based on the proximity between 
clusters according to a similarity metric. 


27.8.2 Graphs 

The field of graph drawing is concerned with finding a spatial position for the 
nodes in a graph in 2D or 3D space and routing the edges between these nodes 
(Di Battista et al., 1999). In many cases the edge-routing problem is simpli- 
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fled by using only straight edges, or by only allowing right-angle bends for the 
class of orthogonal layouts, but some approaches handle true curves. If the graph 
has directed edges, a layered approach can be used to show hierarchical struc¬ 
ture through the horizontal or vertical spatial ordering of nodes, as shown in Fig¬ 
ure 27.2. 



A suite of aesthetic criteria operationalize human judgements about readable 
graphs as metrics that can be computed on a proposed layout (Ware et al., 2002). 
Figure 27.22 shows some examples. Some metrics should be minimized, such 
as the number of edge crossings, the total area of the layout, and the number of 
right-angle bends or curves. Others should be maximized, such as the angular 
resolution or symmetry. The problem is difficult because most of these criteria 
are individually NP-hard, and moreover they are mutually incompatible (Bran¬ 
denburg, 1988). 

Many approaches to node-link graph drawing use force-directed placement, 
motivated by the intuitive physical metaphor of spring forces at the edges drawing 
together repelling particles at the nodes. Although naive approaches have high 
time complexity and are prone to being caught in local minima, much work has 
gone into developing more sophisticated algorithms such as GEM (Frick et al., 
1994) or IPSep-CoLa (Dwyer et al., 2006). Figure 27.23 shows an interactive 
system using the r-PolyLog energy model, where a focus+context view of the 
clustered graph is created with both geometric and semantic fisheye (van Ham & 
van Wijk, 2004). 




Figure 27.22. Graph lay¬ 
out aesthetic criteria. Top: 
Edge crossings should be 
minimized. Middle: Angular 
resolution should be max¬ 
imized. Bottom: Symme¬ 
try is maximized on the left, 
whereas crossings are min¬ 
imized on the right, showing 
the conflict between the in¬ 
dividually NP-hard criteria. 



Figure 27.23. Force-directed placement showing a clustered graph with both geometric 
and semantic fisheye. Image courtesy Jarke van Wijk (van Ham & van Wijk, 2004), © 2004 

IEEE. 
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Figure 27.24. Graphs can be shown with either matrix or node-link views. Image courtesy 
Jean-Danie! Fekete (Henry & Fekete, 2006), © 2006 IEEE. 


Graphs can also be visually encoded by showing the adjacency matrix, where 
all vertices are placed along each axis and the cell between two vertices is colored 
if there is an edge between them. The MatrixExplorer system uses linked multi¬ 
ple views to help social science researchers visually analyze social networks with 
both matrix and node-link representations (Henry & Fekete, 2006). Figure 27.24 
shows the different visual patterns created by the same graph structure in these 
two views: A represents an actor connecting several communities; B is a com¬ 
munity; and C is a clique, or a complete sub-graph. Matrix views do not suffer 
from cluttered edge crossings, but many tasks including path following are more 
difficult with this approach. 


27.8.3 Trees 

Trees are a special case of graphs so common that a great deal of visualization 
research has been devoted to them. A straightforward algorithm to lay out trees in 
the two-dimensional plane works well for small trees (Reingold & Tilford, 1981), 
while a more complex but scalable approach runs in linear time (Buchheim et 
al., 2002). Figures 27.17 and 27.18 also show trees with different approaches 
to spatial layout, but all four of these methods visually encode the relationship 
between parent and child nodes by drawing a link connecting them. 



































27.8. Examples 


705 



Figure 27.25. Treemap showing a filesystem of nearly one million files. Image courtesy 
Jean-Daniel Fekete (Fekete & Plaisant, 2002), © 2002 IEEE. (See also Plate Lll.) 


Treemaps use containment rather than connection to show the hierarchical 
relationship between parent and child nodes in a tree (B. Johnson & Shneider- 
man, 1991). That is, treemaps show child nodes nested within the outlines of 
the parent node. Figure 27.25 shows a hierarchical filesystem of nearly one mil¬ 
lion hies, where hie size is encoded by rectangle size and hie type is encoded by 
color (Fekete & Plaisant, 2002). The size of nodes at the leaves of the tree can 
encode an additional data dimension, but the size of nodes in the interior does not 
show the value of that dimension; it is dictated by the cumulative size of their de¬ 
scendants. Although tasks such as understanding the topological structure of the 
tree or tracing paths through it are more difficult with treemaps than with node¬ 
link approaches, tasks that involve understanding an attribute tied to leaf nodes 
are well supported. Treemaps are space-filling representations that are usually 
more compact than node-link approaches. 


27.8.4 Geographic 

Many kinds of analysis such as epidemiology require understanding both geo¬ 
graphic and nonspatial data. Figure 27.26 shows a tool for the visual analysis 
of a cancer demographics dataset that combines many of the ideas described in 
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Figure 27.26. Two matrices of linked small multiples showing cancer demographic 
data (MacEachren et al., 2003), © 2003 IEEE. (See also Plate Llll). 
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this chapter (MacEachren et al., 2003). The top matrix of linked views features 
small multiples of three types of visual encodings: geographic maps showing Ap¬ 
palachian counties at the lower left, histograms across the diagonal of the matrix, 
and scatterplots on the upper right. The bottom 2x2 matrix, linking scatterplots 
with maps, includes the color legend for both. The discrete bivariate sequential 
colormap has lightness increasing sequentially for each of two complementary 
hues and is effective for color-deficient people. 


27.8.5 Spatial Fields 

Most nongeographic spatial data is modeled as a field, where there are one or more 
values associated with each point in 2D or 3D space. Scalar fields, for example 
CT or MRI medical imaging scans, are usually visualized by finding isosurfaces 
or using direct volume rendering. Vector fields, for example flows in water or air, 
are often visualized using arrows, streamlines (McLouglin et al., 2009), and line 
integral convolution (LIC) (Laramee et al., 2004). Tensor fields, such as those 
describing the anisotropic diffusion of molecules through the human brain, are 
particularly challenging to display (Kindlmann et al., 2000). In the next chapter, 
spatial fields are discussed in detail. 

Frequently Asked Questions 

• What conferences and journals are good places to look for further infor¬ 
mation about visualization? 

The IEEE VisWeek conference comprises three subconferences: Info Vis (Infor¬ 
mation Visualization), Vis (Visualization), and VAST (Visual Analytics Science 
and Technology). There is also a European Euro Vis conference and an Asian 
Pacific Vis venue. Relevant journals include IEEE TVCG (Transactions on Visu¬ 
alization and Computer Graphics) and Palgrave Information Visualization. 

• What software and toolkits are available for visualization? 

The most popular toolkit for spatial data is vtk, a C/C++ codebase available at 
www.vtk.org. For abstract data, the Java-based prefuse (http://www.prefuse, 
org) and Processing (processing.org) toolkits are becoming widely used. The 
ManyEyes site from IBM Research (www.many-eyes.com) allows people to up¬ 
load their own data, create interactive visualizations in a variety of formats, and 
carry on conversations about visual data analysis. 




Spatial-Field Visualization 


The topic of visualization was introduced in the previous chapter, together with 
visual encodings appropriate for a wide range of types of data. For many visu¬ 
alization applications, the main challenge lies in finding the appropriate spatial 
mapping of the data, but in other cases the data comes with a natural mapping. 
For instance, a photograph is a set of measured data that has an obvious visual¬ 
ization: simply display it on the screen. However, other ways of displaying the 
data may be useful as well, depending on what the user is trying to learn from it. 
An X-ray radiograph used to diagnose a broken bone is another example of a 2D 
image that is normally displayed directly. 

An X-ray is a 2D scalar field: a dataset that describes a function R 2 —> R, 
in this case representing a projection of the density of a patient’s body onto a 
plane. Other kinds of medical images, such as computed tomagraphy (CT) images 
or magnetic resonance images (MRIs), are 2D scalar fields that describe slices 
through a patient’s body rather than projections. If many closely-spaced slices 
are measured, then the resulting dataset is a 3D scalar field, or volume dataset, 
representing a function R 3 —> R. This type of data can be displayed one slice 
at a time, but it also invites perspective or orthographic views that can provide 
additional insight into 3D shape. 

The importance of scalar fields has led to a number of special techniques and 
algorithms, particularly for rendering 3D views of volume data. As with other 
kinds of visualization, the primary goal is to map the relevant features of the data 
into visual features that play to the strengths of the human visual system. 
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28. Spatial-Field Visualization 



Figure 28.1. A contour 
plot for four levels of the 
function 1 - x 2 - y 2 . 



Figure 28.2. A random 
density plot for four levels of 
the function 1 - x 2 - y 2 . 



Figure 28.3. A grayscale 
density plot of the function 
1 


2 2 

■ x - y . 


28.1 2D Scalar Fields 


For simplicity, assume that our 2D scalar data is defined as 


f{x,y) 


1 — x 2 — y 2 , if x 2 + y 2 < 1, 
0 otherwise, 


(28.1) 


over the square (x, y) £ [—1, l] 2 . In practice, we often have a sampled represen¬ 
tation on a rectilinear grid that we interpolate to get a continuous field. We will 
ignore that issue in 2D for simplicity. 

One way to visualize a 2D field is to draw lines at a finite set of values 
f(x,y) = fi (shown for the function in Equation 28.1 in Figure 28.1). This 
is done on many topographic maps to indicate elevation. Isocontours are excel¬ 
lent at communicating slope, but are hard to read “globally” to understand large 
trends and extrema in the data. 

Another common way to visualize 2D data is to use small pseudorandom dots 
whose density is proportional to the value of the function. This is shown for our 
test function in Figure 28.2. Such random density plots are useful for display on 
black-and-white media, but are otherwise usually not a good choice for visualiza¬ 
tion. Random density plots look smoother and smoother as more and smaller dots 
are used maintaining overall density. As the dot size shrinks below human visual 
acuity, the image looks smooth. This results in a grayscale continuous tone plot 
of the function. It is hard for humans to read such plots, because our ability to 
detect absolute intensity levels is poor. For this reason, color or thresholding is 
often used. This is shown in grayscale in Figure 28.3. Formally, we can specify 
such a mapping with just a function g that maps scalar values to colors: 


<7 : R i—> [0, l] 3 . 


Here [0, l] 3 refers to the RGB cube. A common strategy is to specify a set of 
colors to which specific values map and linearly interpolate colors between them. 
A set of colors that increases in intensity and cycles in hue is often used. Such a 
set of colors for the domain [0,1] is 

5 ( 0 . 00 ) = ( 0 . 0 , 0 . 0 , 0 . 0 ) 

<7(0.25) = (0.0,0.0,1.0) 

<7(0.50) = (1.0,0.0,0.0) 

<7(0.75) = (1.0,1.0,0.0) 

< 7 ( 1 . 00 ) = ( 1 . 0 , 1 . 0 , 1 . 0 ) 
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These plots are often called pseudocolor displays. We can also display the func¬ 
tion as a height plot as shown in Figure 28.4. This type of plot is good for showing 
the shape of a function. Note that this plot makes it more obvious that the function 
is spherical. 

Often, more than one of these methods are used together in a single image, 
such as a colored or contoured height plot. Another hybrid technique that is often 
used is to shade the height plot and view it orthogonally from above. This is a 
shaded relief map, often used for geographical applications. 


28.2 3D Scalar Fields 

In 3D we can use some of the same techniques as in 2D. We can make a con¬ 
tour plot, where each contour is a 3D surface called an isosurface. We can also 
generalize a random density plot to 3D by scattering particles in 3D. If we take 
the limit, as we did in 2D to get a pseudocolor display, then we get direct volume 
rendering. These two methods are covered here. It is not clear how to generalize 
height plots, because we have run out of dimensions. 


28.2.1 Isosurfaces 

Given a 3D scalar field f(x, y, z) we can create an isosurface for f(x , y, z ) = fo. 
In practice, we will have / defined in a 3D rectilinear table that we interpolate for 
intermediate values. An example image is shown in Figure 28.5 

There are two basic approaches to creating images of isosurfaces. The first is 
to explicitly create a polygonal representation of the isosurface and then render 
that representation using standard rendering techniques. The second is to use ray 
tracing to create an image by direct intersection calculation. In ray tracing, no 
explicit surface is computed. The explicit approach is better when we have small 
datasets, or we need the isosurface itself rather than just an image of it. The ray 
tracing approach is better for large datasets where we just need the image of the 
isosurface. 

Creating Polygonal Isosurfaces 

The basic idea of creating polygonal isosurfaces treats every rectilinear cell as a 
separate problem (Wyvill et al., 1986; W. E. Lorensen & Cline, 1987). Given an 



Figure 28.4. A height plot 
of the function. 



Figure 28.5. An isosur¬ 
face from the NIH/NIM Visi¬ 
ble Female data set. 
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28. Spatial-Field Visualization 



Figure 28.6. Three cases 
for polygonal isosurfacing. 
The black vertices are on 
one side of the isovalue, 
and the white on the other. 


isovalue /o, there is a surface in the cell if the minimum and maximum of the 
eight vertex values surround / (J . What surfaces occur depend on the arrangement 
of values above and below /o- This is shown for three cases in Figure 28.6. 

There are a total of 2 8 = 256 cases for vertices above and below the isovalue. 
We can just enumerate all the cases in a table, and do a look-up. We can also 
take advantage of some symmetries to reduce the table size. For example, if we 
reverse above/below vertices, we can halve the table size. If we are willing to do 
flips and rotations, we can reduce the table to size 16, where only 15 of the cases 
have polygons. 


Ray Tracing 

Although the above algorithm, usually called marching cubes is elegant and sim¬ 
ple, some care must be taken to ensure accurate results (Nielson, 2003). 

The algorithm for intersecting a ray with an isosurface has three phases: tra¬ 
versing a ray through cells which do not contain an isosurface, analytically com¬ 
puting the isosurface when intersecting a voxel containing the isosurface, shading 
the resulting intersection point (Lin & Ching, 1996; Parker, Parker, et al., 1999). 
This process is repeated for each pixel on the screen. 

To find an intersection, the ray a + t b traverses cells in the volume checking 
each cell to see if its data range bounds an isovalue. If it does, an analytic com¬ 
putation is performed to solve for the ray parameter t at the intersection with the 
isosurface: 


p(x a + tx b , y a + ty b , z a + tz b ) - p iso = 0. 

When approximating p with a trilinear interpolation between discrete grid points, 
this equation will expand to a cubic polynomial in t. This cubic can then be 
solved in closed form to find the intersections of the ray with the isosurface in 
that cell. Only the roots of the polynomial which are contained in the cell are 
examined. There may be multiple roots corresponding to multiple intersection 
points. In this case, the smallest t (closest to the eye) is used. There may also 
be no roots of the polynomial, in which case the ray misses the isosurface in the 
cell. 

A rectilinear volume is composed of a three-dimensional array of point sam¬ 
ples that are aligned to the Cartesian axes and are equally spaced in a given dimen¬ 
sion. A single cell from such a volume is shown in Figure 28.7. Other cells can 
be generated by exchanging indices (z, j, k) for the zeros and ones in the figure. 
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x 0 , yi.zi 

(u,v,w)=(0,1,1) 



Pioo 


Figure 28.7. The geometry for a cell. A “nice” uvw coordinate system is used to make 
interpolation math cleaner. 


The density at a point within the cell is found using trilinear interpolation: 


where 
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(28.2) 


u = 


v = 


X — Xo 
X!-Xo’ 

y-yo 

2/i — 2/o ’ 
z - z 0 


(28.3) 


w 


Zl - z 0 
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28. Spatial-Field Visualization 



Figure 28.9. A true tri- 
linear isosurface generated 
using direct ray tracing. 



Figure 28.8. Various coordinate systems used for interpolation and intersection. 


Note that 


1 — u 
1 — v 
1 — w 


X\ — x 

xi- x 0 ' 
yi-y 
2/1 — 2/0 ’ 
Z\ — z 
Zi - Z 0 ' 


(28.4) 


If we redefine uq = 1 — u and u\ = u, and use similar definitions for 
vo,vi,wo, wi, then we get (Figure 28.8) 

P = ^ \ tr iVjWkPijk • 
i,j,k— 0,1 


It is interesting that the true trilinear isosurface can be fairly complex. The case 
where two opposite corners of the cube are on opposite sides of the isovalue from 
the other six vertices is shown in Figure 28.9. This is quite different from the two 
triangles given by polygonal isosurfacing for that case. One advantage of direct 
intersection with the trilinear surface is that ambiguous cases do not arise. 

For a given point (x, y, z) in the cell, the surface normal is given by the gra¬ 
dient with respect to (a;, y , z): 


N = Vp 


/dp dp dp\ 

Vfe’(V d^J ' 


Thus, the normal vector of ( N x , Ny ,N Z ) = V p is 

(-1 ) t+1 VjW k 


n,= Y. 


i,j,k= 0,1 


Xi - X 0 


Pijk j 


N v= E 

i,j,k= 0,1 


(-i y +1 uiw k 
2/i - yo 


Pijk i 



























28.2. 3D Scalar Fields 


715 


Nz = { ~ 1)k+1UiV > 


z 1 - z 0 


Pij k • 


i,j,k= 0,1 

Given a ray p = a + tb, the intersection with the isosurface occurs when 
p(p) = /3j so - We can convert this ray into coordinates defined by (u o,vo,wq): 
Po = ao + fbo and a second ray defined by p : = ai + tbi. Here the rays are in 
the two coordinate systems (Figure 28.8): 


a o = (ug,vg,wg) = 


and 


b 0 = (ug.uo.w'o) = 


equations are similar for ai and bi: 

at = (ui,v f,w?) = 

and 

bi = (u\,v\,w\) = 


Xi 


2/1 

- 2 la 

Zl 

~ Z a 

X\ 

- X 0 

' 2/1 

— 2/o J 

Zl 

- z 0 


Xb 


2/6 


Zb 

Xi 

- Xo 

’ 2/i 

- 2/o ’ 

Zl 

- Z 0 

le ao is a 

location and 

b 0 is 


- Xo 

2 la 

- 2/o 

Za 

- z 0 

Xi 

- Xo 

’ 2/1 

— 2/o ’ 

Zl 

- Z 0 


-Xb 


-2/6 


-Zb 

Xi 

- Xo 

’ 2/1 

- 2/o ’ 

Zl 

- z 0. 


Note that t is the same for all three rays; it can be found by traversing the cells and 
doing a brute-force algebraic solution for t. The intersection with the isosurface 
p(p) = Pjso occurs when 


^iso — E 

i,j,k= 0,1 


(< + tUi) (Vj + tv )) (w% + twl) p ijk . 


This can be simplified to a cubic polynomial in t: 

At 3 + Bt 2 + Ct + D = 0, 


where 


A \" b b b 

A = 2^ u i v j w kPijk, 

ij,k= 0,1 


B = 

E ( 

[ u i v j w k + u i v j wb k + u)v)wl) 

Pijk > 


i t j,k=0,l 



C = 

E ( 

[v)v)w% + u?v)wl + v%v)wl) 

) Pijk 


ij^k— 0,1 



D = 

~Piso + 

E U i V j W kPijk■ 



i,j,k— 0,1 
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The solution to a cubic polynomial is discussed in Cubic and Quartic Roots 
(Schwarze, 1990). His code is available on the web in several Graphics Gems 
archive sites. Two modifications are needed to use it: linear solutions (his code 
assumes A is non-zero), and the EQN_EPS parameter is set to 1.0e-30, which 
provided for maximum stability for large coefficients. 


28.2.2 Direct Volume Rendering 


Another way to create a picture of a 3D scalar field is to do a 3D random density 
plot using small opaque spheres. To avoid complications, the spheres can be made 
a constant color and, in effect, they are light emitters with no reflectance. Such 
a random density plot can be implemented directly using ray tracing and small 
spheres, or with 3D points using a traditional graphics API. As in 2D, we can 
take the limit as the sphere size goes to zero. This yields a 3D analog of the 
pseudocolor display and is usually called direct volume rendering (Levoy, 1988; 
Drebin et al„ 1988; Sabella, 1988; Upson & Keeler, 1988). 

There are two parameters that affect the appearance of a volume rendering: 
sphere color, and sphere density. These are controlled by a user-specified transfer 
function : 

color = c(p), 
number density = d(p). 

Here the number density is the number of spheres per unit volume. If we assume 
that the spheres have a small cross-sectional area a, and we consider a region 
along the line of sight that is of a small thickness As such that no spheres appear 
to overlap (Figure 28.10), then the color is 

L{s + As) = (1 - F)L(s) + Fc, 



Figure 28.10. A thin slab 
filled with opaque spheres. 


where F is the fraction of the disk that is covered by spheres as seen from the 
viewing direction. Because the disk is very thin, we can ignore spheres visually 
overlapping, so this fraction is just the total cross-sectional area of the spheres 
divided by the area A of the disk: 


F = 


da A As 

A 


= daAs, 


which yields 


L{s + As) = (1 — da As)L(s) + daAsc. 
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Figure 28.11. For direct volume rendering, we can take constant size steps along the ray 
and numerically integrate. 


We can rearrange terms to give something like a definition of the derivative: 

L{s + As) — L(s) 


As 


= — daL(s ) + dac. 


If we take the limit As —> 0, we get a differential equation: 

dL 


ds 


= —daL(s) + dac. 


For constant d and c this equation has the solution 

L(s) = L(0)e~ das + c (1 - e~ das ) . 


This would allow us to analytically compute color for constant density/color re¬ 
gions. However, in practice both d and c vary along the ray, and there is no 
analytic solution to the differential equation. So, in practice, we use a numer¬ 
ical technique. A simple way to proceed is to start at the back of the ray and 
incrementally step along the ray as shown in Figure 28.11. 

We can apply the original equation for each As slice: 


L(s + As) = (1 — d(x, y , z)a As)L(s) + d{x , y , z)a A sc(x, y, z). 


In pseudocode, we initialize the color to the background color ct, and then traverse 
the volume from back to front: 
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Figure 28.12. A maximum-intensity projection of the NIH/NIM Visible Female dataset. Each 
pixel contains a grayscale value that corresponds to the maximum density encountered along 
that ray. Image courtesy Steve Parker. 
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find volume entry and exit points a and b 

L = Cb 

As = distancefa, b) 

P = b 

for i = 1 to N do 
p = p — As(b a) 

L = L + (1 — d(p)aAsL + d(p)a Asc(p)) 

The step size As will determine the quality of the integration. To reduce the 
number of variables, we can use a new density function g{ p) = d(p)a. 

In some applications direct volume rendering is used to render something sim¬ 
ilar to surfaces. In these cases the transfer function on density is “on” or “off” and 
the gradient of the number density is used to get a surface normal for shading. 
This can produce images of pseudosurfaces that are less sensitive to noise than 
traditional isosurfacing. 

Another way to do volume rendering is maximum-intensity projection . Here, 
we set each pixel to the maximum density value encountered along a ray. This 
turns the ray integration into a search along the ray which is more efficient. Fig¬ 
ure 28.12 shows an image generated using maximum-intensity projection. 

Frequently Asked Questions 

• What is the best transfer function for direct volume rendering? 

The answer depends highly on the application and the characteristics of the data. 
Some empirical tests have been run and can be found in (Pfister et al., 2001). Var¬ 
ious optical models used in direct volume rendering are described in (Max, 1995). 

• What do I do to visualize vector or tensor data? 

Vector data is often visualized using streamlines, arrows, and line-integral convo¬ 
lution (LIC). Such techniques are surveyed in (Interrante & Grosch, 1997). Tensor 
data is more problematic. Even simple diffusion tensor data is hard to visualize 
effectively because you just run out of display dimensions for mapping of data 
dimensions. See (Kindlmann et al., 2000). 

• How do I interactively view a volume by changing isovalues? 

One way is to use ray tracing on a parallel machine. The other is to use polygonal 
isosurfacing with a preprocess that helps search for cells containing an isosurface. 
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That search can be implemented using the data structure in (Livnat et al., 1996). 

• My volume data is unstructured tetrahedra. How do I do isosurfacing or 
direct volume rendering? 

Isosurfacing can still be done in a polygonal fashion, but there are fewer cases to 
preprocess. Ray tracing can also be used for isosurfacing or direct volume ren¬ 
dering, but the traversal algorithm must progress through the unstructured data 
either using neighbor pointers (Garrity, 1990) or by adding cells to an efficiency 
structure (Parker, Parker, et al., 1999). 

• What is “splatting” for direct volume rendering? 

Splatting refers to projecting semitransparent voxels onto the screen using some 
sort of painters' algorithm (Laur & Hanrahan, 1991). 


Exercises 

1. If we have a tetrahedral data element with densities at each of the four 
vertices, how many “cases” are there for polygonal isosurfaces? 

2. Suppose we have n 3 data elements in a volume. If the densities in the 
volume are “well behaved,” approximately how many cells will contain an 
isosurface for a particular isovalue? 

3. Should we add shadowing to direct volume rendering? Why or why not? 
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